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Summary  of  Significant  Findings. 

In  a)  1,  we  articulate  via  an  example  from  reliability,  the  difference  between  the  notions 
of  probability,  chance,  likelihood,  vagueness,  belief  and  plausibility.  To  the  best  of  our 
knowledge,  it  is  the  only  document  that  carefully  makes  a  distinction  between  these 
intertwined  notions,  and  states  clearly  what  each  of  these  terms  mean  and  when  to  use 
them. 

In  a)  2,  we  introduce  the  notion  of  a  vague  system  ;  i.e.  a  system  that  can  simultaneously 
exist  in  more  than  one  state.  This  is  done  via  the  mathematics  of  many  valued  logic.  The 
traditional  approach  in  system  theory  is  via  binary  logic;  it  is  limited  in  scope. 

In  a)  3,  we  make  the  important  argument  that  when  predicting  remaining  life,  what 
matters  most  is  the  likelihood,  not  the  probability  model.  This  paper  digs  deep  into  the 
meaning  of  conditional  probability  and  shows  how  one  can  arrive  upon  different 
predictions. 

In  a)  5,  we  address  the  important  practical  question  of  what  should  the  coverage 
probability  for  a  prediction  interval  be.  Should  it  be  90%,  95%,  or  something  else?  We 
argue  that  this  is  a  problem  in  optimal  decision  making,  a  matter  that  has  been  totally 
overlooked. 

In  a)  7,  we  introduce  a  new  fundamental  notion,  namely  that  of  a  hazard  potential.  We 
argue  that  items  fail  when  suitably  chosen  stochastic  processes  hit  the  hazard  potential. 
The  chosen  stochastic  processes  depend  on  the  environment  in  which  units  and  systems 
operate. 

In  a)  9,  we  harness  the  thesis  of  a)  7  to  argue  that  degradation  is  an  abstract  notion,  but  its 
observable  markers  are  things  like  crack  growth,  wear,  and  CDA  cell  counts.  We  then 
make  clear  the  meaning  of  competing  risks  and  view  them  as  stochastic  processes.  This 
is  a  chance  in  the  manner  in  which  one  thinks  of  competing  risks  and  degradation. 

In  b)  1 ,  we  summarize  our  research  over  the  past  several  years,  much,  if  not  all,  supported 
by  the  ONR,  in  reliability  and  survival  analysis,  and  systems  survivability.  This  book,  we 
think  is  unique  because  it  represents  a  paradigm  shift  in  how  one  should  think  about 
reliability  and  survivability,  and  because  unlike  the  existing  books  on  the  subject,  it 
dwells  into  uncharted  territories  on  several  fronts.  The  point  of  view  taken  here  is 
Bayesian  and  notions  like  the  failure  rate,  survival,  and  systems  integrity  are  interpreted 
from  this  perspective.  The  book  also  discusses  the  use  of  expert  testimonies  and 
information  theoretic  notions  in  failure  data  analysis  and  the  design  of  life  tests. 
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In  our  day-to-day  discourse  on  uncertainty,  words  like  belief,  chance,  plausible,  likelihood  and  probability  are  commonly  encountered. 
Often,  these  words  are  used  interchangeably,  because  they  are  intended  to  encapsulate  some  loosely  articulated  notions  about  the 
unknowns.  The  purpose  of  this  paper  is  to  propose  a  framework  that  is  able  to  show  how  each  of  these  terms  can  be  made  precise, 
so  that  each  reflects  a  distinct  meaning.  To  construct  our  framework,  we  use  a  basic  scenario  upon  which  caveats  are  introduced. 
Each  caveat  motivates  us  to  bring  in  one  or  more  of  the  above  notions.  The  scenario  considered  here  is  very  basic;  it  arises  in  both 
the  biomedical  context  of  survival  analysis  and  the  industrial  context  of  engineering  reliability.  This  paper  is  expository  and  much  of 
what  is  said  here  has  been  said  before.  However,  the  manner  in  which  we  introduce  the  material  via  a  hierarchy  of  caveats  that  could 
arise  in  practice,  namely  our  proposed  framework,  is  the  novel  aspect  of  this  paper.  To  appreciate  all  this,  we  require  of  the  reader  a 
knowledge  of  the  calculus  of  probability.  However,  in  order  to  make  our  distinctions  transparent,  probability  has  to  be  interpreted 
subjectively,  not  as  an  objective  relative  frequency. 

Keywords:  Belief  functions,  biometry,  likelihood,  plausibility,  quality  assurance,  reliability,  survival  analysis,  uncertainty,  vagueness 


1.  Probability  and  chance 

1.1.  Introduction:  Statement  of  the  problem  and  objectives 

Consider  the  following  archetypal  problem  that  commonly 
arises  in  the  contexts  of  biomedicine,  engineering  and  the 
physical  sciences. 

Suppose  that  at  some  reference  time  r,  the  “now  time,” 
YOU  are  asked  to  predict  the  time  to  failure  T  of  some 
physical  or  biological  unit.  The  capitalized  YOU  is  to  em¬ 
phasize  the  fact  that  it  is  a  particular  individual,  namely 
yourself,  that  has  been  asked  to  make  the  prediction.  To  fa¬ 
cilitate  prediction,  you  examine  the  unit  carefully  and  learn 
all  that  you  can  about  its  genesis:  how,  when  and  where  it 
was  made.  You  denote  this  information  by  H( r),  for  history 
at  time  r.  In  the  case  of  biological  units,  H(r)  would  pertain 
to  genetic  and/or  medical  information.  Suppose,  as  is  gen¬ 
erally  true,  that  based  on  7t (r)  you  conclude  that  prediction 
with  certainty  is  not  possible.  Consequently,  you  are  now 
faced  with  two  options:  walk  away  from  the  problem,  or 
make  an  informed  guess  about  T. 

Suppose  that  you  choose  the  second  option  and  are  pre¬ 
pared  to  make  guesses  about  the  event  ( T  >  /),  for  some 
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/  >  0.  In  reliability,  /  >  0  is  known  as  the  “mission  time.” 
There  are  several  additional  caveats  to  this  basic  problem 
that  go  into  forming  our  overall  framework;  these  will  be 
presented  in  Sections  2  and  3.  In  Section  2,  we  introduce 
the  caveat  of  data,  and  in  Section  3  the  caveat  of  surrogate 
information. 

To  keep  the  mathematics  simple,  you  introduce  a  counter, 
say  X,  and  adopt  the  convention  that  X  =  1  (a  “success”) 
whenever  T  >ty  and  X  =  0  (a  “failure”),  otherwise.  Thus, 
the  events  (T  >  t)  and  (X  —  1)  are  isomorphic;  however, 
there  is  a  loss  of  granularity  in  going  from  T  to  X.  This 
is  because  X  continues  to  equal  one,  even  when  T  >  t  +  at 
for  any  and  all  a  >  0.  With  the  introduction  of  Xy  informed 
guesses  about  (T  >  /)  boil  down  to  informed  guesses  about 
(X  ==  1).  But  what  do  we  mean  by  an  informed  guess,  and 
how  shall  we  make  this  operational?  Do  the  terms  proba¬ 
bility,  chance  and  likelihood  constitute  an  informed  guess, 
or  does  each  of  these  terms  connote  a  distinct  notion?  Fur¬ 
thermore,  do  these  terms  cover  all  the  scenarios  of  uncer¬ 
tainty  that  one  can  possibly  encounter  or  are  there  sce¬ 
narios  that  call  for  additional  notions  such  as  “belief” 
and  “plausibility”?  The  aim  of  this  paper  is  to  show  that 
each  of  the  above  terms  encapsulates  a  distinct  notion, 
so  that  their  indiscriminate  use  should  not  be  a  matter  of 
course. 
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1.2.  Personal  probability:  Making  guesses  operational 

By  informed  guess,  we  mean  a  quantified  measure  of  your 
uncertainty  about  the  event  (X  =  l)inthelightof7f(r),and 
subsequent  to  a  thoughtful  evaluation  of  its  consequences. 
Now,  it  is  generally  well  acknowledged  that  probability  is 
a  satisfactory  way  to  quantify  uncertainty,  and  to  some, 
such  as  Lindley  (1982),  the  only  satisfactory  way.  There 
are  several  interpretations  of  probability  (c.f.  Good  (1965)). 
The  one  we  shall  adopt  is  personal  probability,  also  known  as 
subjective  probability.  Here,  you  quantify  your  uncertainty 
about  the  event  (X  =  1),  based  on  7i( r),  by  your  personal 
probability  denoted: 

PY(X=  (1) 

The  subscript  indexing  P  emphasizes  the  fact  that  the  spec¬ 
ified  probability  is  that  of  a  particular  individual,  namely, 
you.  For  convenience,  we  set  r  =  0  and  denote  7f( 0)  by  sim¬ 
ply  7i.  Henceforth,  we  also  omit  the  subscript  associated 
with  P ,  so  that  Equation  (1)  is  written: 

P(X=\;H)=pt  (2) 

where  0  <  p  <  1 .  The  p  so  specified  is  a  personal  probabil¬ 
ity  because  it  is  not  unique  to  all  persons;  more  important, 
it  can  change  with  time  for  the  same  individual.  This  is  be¬ 
cause  the  background  history  for  this  person  also  changes, 
and  it  is  the  history  that  plays  a  key  role  in  specifying  a  per¬ 
sonal  probability.  Thus,  an  informed  guess  is  tantamount 
to  specifying  a  p,  where  p  is  a  personal  probability 

To  make  an  informed  guess  operational,  that  is,  to  make 
a  pragmatic  use  of  it,  we  need  to  interpret  p.  For  this  we 
appeal  to  De  Finetti  (1974)  who  proposed  that  p  represent 
the  amount  you — the  specifier  of  p — is  willing  to  stake  in 
a  two-sided  bet  (or  gamble)  about  the  event  (X  =  1).  That 
is,  should  X  turn  out  to  be  one,  you  receive  as  a  reward 
one  monetary  unit  against  the  p  staked  out  by  you.  Should 
X  turn  out  to  be  zero,  then  the  amount  staked,  namely 
p,  is  lost.  By  a  two-sided  bet,  we  mean  the  willingness  to 
stake  p  for  the  event  (X  =  1),  or  an  amount  (1  —  p)  for  the 
event  (X  =  0).  That  is,  you  are  indifferent  between  the  two 
gambles:  one  monetary  unit  in  exchange  for  p  if  (X  —  1), 
or  one  monetary  unit  in  exchange  for  (1  —  p)  if  ( X  =  0).  It 
is  useful  to  bear  in  mind  that  in  keeping  with  the  spirit  of 
the  individual  nature  of  personal  probability,  the  amount 
p  represents  your  stake.  For  the  same  event  (X  =  1),  your 
colleague  may  choose  to  stake  a  different  amount  p,  with 
p  =£  p.  It  is  also  important  to  note  that  with  p  interpreted 
as  a  gamble,  the  bet  will  only  be  settled  when  X  reveals 
itself.  Thus,  bets  can  only  be  made  operational  for  events 
that  are  ultimately  observed.  We  do  not  consider  here  the 
disposition  of  the  second  party  in  the  bet;  we  assume  that 
the  second  party  is  willing  to  accept  any  bet  put  forth  by 
you. 

Thus,  to  summarize,  in  the  context  of  this  paper,  the  word 
“probability”  is  used  to  denote  the  amount  an  individual 
is  prepared  to  stake  in  a  two-sided  bet  about  an  uncertain 
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event.  This  probability  can  be  specified  based  on  H  alone, 
and  it  is  not  essential  that  7i  contain  data  on  items  judged  to 
be  similar  to  the  item  in  question.  That  is,  personal  probabil¬ 
ities  can  be  specified  without  the  benefit  of  having  observed 
data. 

1.3.  Chance  or  propensity:  A  useful  abstraction 

Whereas  specifying  a  personal  probability  can  be  done 
solely  by  introspection  considering  7f,  a  more  systematic  ap¬ 
proach,  which  involves  breaking  the  problem  into  smaller, 
easier  problems,  begins  with  invoking  the  law  of  total  prob¬ 
ability  on  the  event  (X  =  1 ;  7 i).  Specifically,  for  some  un¬ 
known  quantity  0,  0  <  0  <  1,  and  an  entity  n(0;  7T),  whose 
interpretation  is  given  later  in  Section  1.4: 

P(X=\;H)=  f  P(X=\\e;n)n(0;n)d6,  (3) 

Jo 

=  f  P(X  =  1  |  0);r(0;77)d0t  (4) 

Jo 

if  you  assume  that  X  is  independent  of  7 i  given  0.  That 
is,  were  you  to  know  0 ,  then  knowledge  of  7i  is  unneces¬ 
sary.  The  meaning  of  0 ,  known  as  a  parameter ,  remains  to 
be  discussed,  but  for  now  we  state  that  in  the  language  of 
personal  probability,  Equation  (3)  implies  an  extension  of 
the  conversation  from  P(X  =  1 ;  H)  to  P(X  =  1  |  0;  7 i).  The 
idea  here  is  that  after  invoking  the  assumption  of  indepen¬ 
dence,  you  may  find  it  easier  to  quantify  your  uncertainty 
about  (X  =  1)  were  you  to  know  0,  than  quantifying  the 
uncertainty  based  on  7 i.  Whereas  the  dimension  of  7 i  can 
be  very  large,  the  dimension  of  0  is  one.  Thus,  the  role  of 
the  parameter  0  is  to  simplify  the  process  of  uncertainty 
quantification  by  imparting  to  X  independence  from  H. 

In  Equation  (4),  the  quantity  P{X  =1  |  0)  is  known  as  a 
probability  model  for  the  binary  X.  Following  Bernoulli, 
you  let  P(X  =  1  |  0)  =  0,  where  P(X  =1(0)  represents 
your  bet  (personal  probability)  about  the  event  (X  =±  1) 
were  you  to  know  0.  This  brings  us  to  the  question  of  what 
does  0  mean?  That  is,  how  should  we  interpret  0? 

The  meaning  of  0  was  made  transparent  by  De  Finetti 
(c.f.  Lindley  and  Phillips  (1 976))  in  his  now  famous  theorem 
on  binary  exchangeable  sequences.  Loosely  speaking,  this 
theorem  says  that  if  a  large  number  of  units  judged  similar 
to  each  other  (the  technical  term  is  exchangeable)  and  to 
the  unit  in  question  were  to  be  observed  for  their  survival 
or  failure  until  /  ,  and  if  A/  —  1  if  the  ith  item  survived  until 
/  (A \  —  0  otherwise),  then: 

1  n 

0=\im~ykXi,  (5) 

n  “ 

that  is  0  is  the  average  of  the  Af/S,  when  the  number  of  X&  is 
infinite.  De  Finetti  refers  to  this  0  as  a  chance  or  propen¬ 
sity.  Note  that  there  is  no  personal  element  involved  in 
defining  0,  other  than  the  fact  that  0  derives  from  the  be¬ 
havior  of  exchangeable  sequences,  and  exchangeability  is  a 
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judgment.  What  you  judge  to  be  exchangeable  may  not  sit 
well  with  your  colleagues.  Because  9  connotes  the  limit  of 
an  exchangeable  binary  sequence,  9  can  be  seen  as  an  ob¬ 
jective  entity.  More  important,  since  9  cannot  be  actually 
observed  (n  in  the  Equation  (5)  is  infinite),  we  claim  that 
chance  is  an  abstract  construct.  It  is  a  useful  abstraction 
all  the  same,  because  in  writing  P(X  =1  |  9)  =9,  you  are 
saying  that  your  stake  on  the  uncertain  event  (X  =  1)  is  9, 
were  you  to  know  9 .  But  no  one  can  possibly  tell  you  what 
9  is,  and  this  is  what  leads  us  to  the  next  section.  But  before 
we  do  so,  it  may  be  of  interest  to  mention  a  few  words  about 
two  other  interpretations  of  9. 

One  is  due  to  Laplace,  who  in  keeping  with  the  scientific 
climate  of  his  time,  and  being  influenced  by  Newton,  was 
concerned  with  cause  and  effect  relationships.  Accordingly, 
to  Laplace,  9  was  the  cause  of  an  effect,  namely,  the  event 
{X  =  1).  The  second  interpretation  of  9  stems  from  the  rel¬ 
ative  frequency  interpretation  of  probability.  Indeed,  here 
9  is  taken  to  be  the  probability  that  X  =  1. 

Finally,  even  though  the  notion  of  chance  introduced 
here  has  been  in  the  context  of  binary  variables,  a  parallel 
notion  also  exists  for  other  kinds  of  variables. 

1.4.  Probability  of  chance:  Taking  chances  with  chance 

Since  9  is  unknown,  and  in  principle  can  never  be  known, 
you  are  uncertain  about  9.  In  keeping  with  the  dictum  that 
all  uncertainty  be  described  by  probability,  you  let  Py(Q  < 
9;PQ  encapsulate  your  bet  on  the  event  (0  <  9 ).  Here,  in 
keeping  with  standard  convention,  all  unknown  quantities 
arc  denoted  by  capital  letters  and  their  realized  values  by 
the  corresponding  small  letter;  thus  our  use  of  0  and  9. 
Since  ©  can  take  all  values  in  the  continuum  (0, 1),  we  shall 
assume  that  Py(€>  <  9;PC)  is  “absolutely  continuous,”  so 
that  its  density  at  9  exists,  for  0  <  9  <  1.  We  denote  this 
density  by  ny(9;  PC)  and  interpret  it  as 

n{9\ PC)d  9  zzP(9<<3<9  +  d9;  PC) . 

For  convenience,  the  subscript  Y  has  been  dropped. 

Thus,  n(9\  TC)d9  is  approximately  your  personal  proba-. 
bility  that  the  unknown  chance  ©  is  in  the  interval  [9,  9  4- 
d 9],  Since  9  will  never  be  known,  the  bet  on  ©  cannot 
he  settled.  However,  since  7t(9;PC)  goes  into  determining 
P(X  =  1  ;PC) — see  Equation  (6)  below — and  since  bets  on 
(X  =  1  ;PC)  can  be  settled,  n{9\PC)  can  also  be  interpreted 
as  a  technical  device  that  helps  you  specify  your  bet  on  an 
observable. 

With  the  above  in  place,  plus  the  fact  that  in  our  case 
P{X  =  1  |  9)  =  9,  Equation  (4)  becomes: 

P(X=];7i)=p=  f  9  x  Jt(9;  PC)d9.  (6) 
Jo 

Equation  (6)  above  is  noteworthy.  It  embodies:  (i)  a  per¬ 
sonal  probability  about  the  event  ( X  =  1) — the  left-hand 
side;  (ii)  a  chance  0  taking  the  value  6\  and  (iii)  a  per¬ 


sonal  probability  about  the  chance  ©  belonging  to  the 
interval  [9,9  4  d0] — the  entity  7t(9;PC)d9.  This  equation 
helps  us  make  transparent  the  difference  between  probabil¬ 
ity,  chance  and  the  probability  of  chance. 

There  is  another  angle  from  which  Equation  (6)  can  be 
viewed.  This  comes  from  the  fact  that  the  right-hand  side  of 
Equation  (6)  is  your  expected  value  of  0,  the  expected  value 
being  determined  by  your  n(9;  PC),  Denoting  this  expected 
value  by  £y(©),  we  have; 

P(X=  \;ft)  =  p  =  EY(G), 

implying  that  your  personal  probability  for  the  event  ( X  = 
1)  is  your  expected  value  of  the  chance  ©  with  respect  to 
tc(9\  PC),  your  personal  probability  about  chance. 


2.  The  likelihood  of  chance 

2.1.  Introducing  the  caveat  of  data 

We  supplement  the  framework  of  the  basic  problem  of  Sec¬ 
tion  1 . 1  by  introducing  our  first  caveat.  Suppose  that  in  ad¬ 
dition  to  PC(x),  you  also  have  at  hand  the  binary  X\ , . . . ,  x„, 
where  xf  —  1  if  the  life-length  of  the  ith  item  has  actually 
been  observed  to  exceed  /,  and  xt  =0,  otherwise.  The  n 
items  that  go  into  constituting  the  data  x  —  (xi, .. . ,  x„) 
are  judged  by  you,  prior  to  observing  the  x,  to  be  similar 
(or  exchangeable)  to  the  item  in  question.  What  can  you 
now  say  about  the  unobserved  XI  In  other  words  what  is 
your  prediction  for  the  event  (X  =  1)  in  the  light  o(PC(r)  as 
well  as  x?  Certainly,  the  observed  x  should  help  you  sharpen 
your  prediction.  Consequently,  you  are  now  called  upon  to 
assess  P{X  =  1 ;  x,  PC). 

One  possibility  would  be  to  think  hard  about  all  that 
you  have  at  hand,  namely,  x  and  PC,  and  then  simply  spec¬ 
ify  P(X  =  1 ;  x7  PC)  as  p *,  where  p*  e  (0,  1).  Here  p*  encap¬ 
sulates  your  bet  on  the  event  (X  =  1)  in  the  light  of  x 
and  PC.  If  p *  happens  to  be  identical  to  the  p  of  Equa¬ 
tion  (2),  then  you  are  declaring  the  opinion  that  the  data 
x  has  not  had  a  sufficient  impact  on  your  beliefs  for  you 
to  change  your  bet  from  your  original  p .  From  a  philo¬ 
sophical  point  of  view,  there  is  nothing  in  the  theory  of 
subjective  probability  that  stops  you  from  specifying  a  p* 
by  introspection  alone.  However,  from  a  computational 
point  of  view,  it  is  efficient  to  proceed  formally  along  the 
lines  given  below,  because  introspection  to  specify  p*  sub¬ 
sequent  to  having  specified  p  may  lead  to  an  inconsistency 
(technically  incoherence).  By  incoherence,  we  mean  a  sce¬ 
nario  involving  a  gamble  in  which  “heads  I  win,  tails  you 
lose.” 

2.2.  Bayes9  law:  The  mathematics  of  changing  your  mind 

To  address  the  scenario  presented  in  Section  2.1,  you  start 
by  pondering  the  matter  of  assessing  your  uncertainty  about 
(X  =  1),  in  the  light  of  PC,  were  you  to  know  (but  do  not 
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know)  the  disposition  of  Xu  . . . ,  X„;  here  X\  —  1,  if  the 
ith  item  judged  to  be  similar  to  the  item  in  question  has 
a  life-length  that  exceeds  t  (Xt  =  0,  otherwise).  That  is, 
what  would  be  your  P(X  —  1  \X\ , . . . ,  Xnt  7i)l  To  address 
this  question,  you  follow  the  same  line  of  reasoning  used  to 
arrive  upon  Equation  (4),  that  is,  extend  the  conversation 
to  0 ,  and  obtain 

P(x=\\xu...,xn;n)=  f1p(x=  1 1  etxl . xn) 

Jo 

xn(0\Xu  ...,Xn;?{)d0, 

=  [  P(X=\\0)xw(0\Xx . Xn'H) d0, 

Jo 

=  f  0xn(0\Xu...,Xm;H)dB.  (7) 

Jo 

The  second  equality  is  a  consequence  of  your  judg¬ 
ment  that  X  is  independent  of  X\, ... ,  Xn,  were  you  to 
know  0 ,  and  the  third  a  consequence  of  choosing  P(X  = 
1  |  0)  =  0  as  a  probability  model  for  X.  The  quantity 
7t (0 1 X\  j . . . ,  Xn\TC)  is  the  probability  density  at  0  of  your 
P(®<0\Xli...iXn;H). 

To  obtain  n{0\X\y . . . ,  Xn\7i)  you  invoke  Bayes’  law; 
thus: 

x{0\ Xu  . . . ,  Xn\  H)  oc  P(Xi , . .  - ,  Xn  \  0 ;  7i)  x  tt (0; «) 

n 

=  Y\P{Xi  =  xi\9)xn{e-,H),  (8) 

by  the  multiplication  rule,  and  by  the  independence  of  the 
Xfi  from  each  other,  were  you  to  know  0 ,  and  with  x,  =  1 
or  0.  For  P(X )  —  x,- 1  0),  you  once  again  choose  Bernoulli’s 
model,  so  that  P(Xt  =  x{\0)  =  0*(1  -  0)l~x \ 

With  the  above  in  place,  you  now  have: 

n 

n(e \Xx . Xm-,K)  oc  f]  {0*0  -  tf)1""}  (9) 

;-l 

Since  7t{9\7i)  encapsulates  your  uncertainty  about  ©  in 
the  light  of  ?i  alone,  and  tt(0\X\,  . . . ,  Xn;7i)  your  uncer¬ 
tainty  about  it  were  you  to  be  provided  additional  informa¬ 
tion  via  the  X\y . . . ,  Xn ,  we  say  that  Bayes’  law  provides  a 
mathematical  prescription  for  changing  your  mind  about 
the  unobservable  0.  Once  Equation  (9)  is  at  hand  we  may 
incorporate  it  in  Equation  (7)  to  write: 

P(X=  \\Xu...,X„;?i) 

/*  i  * 

«/  eT\{ex>{\^ff)l~x!)n{9-,7{)A6,  (10) 

Jo  ,= 1 

as  a  presenption  of  how  to  change  your  mind  about  the 
event  (X  =  1)  itself. 

2.3.  Likelihood  function:  The  weight  of  evidence 

There  are  two  aspects  of  Equations  (8)  to  (10)  that  need 
to  be  emphasized.  The  first  is  that  the  left-hand  sides  of 


these  equations  pertain  to  conditional  events,  namely  the 
proposition  that  “were  you  to  know  the  disposition  of  the 
X&  i=  1 that  is,  supposing  you  were  provided  with 
the  realizations  of  each  Xt.  The  second  feature  is  that  they 
inform  the  reader  as  to  how  you  express  your  uncertainties 
(or  bets)  about  0  and  X  respectively,  once  the  Xfi  reveal 
themselves  as  x, .  Implicit  to  this  bet  is  your  particular  choice 
of  probability  models  P(X  —  x  \  0)  and  P(Xi  =  Xi  \  0),  i  = 

In  actuality,  however,  the  Xfi  have  indeed  revealed  them¬ 
selves  in  the  form  of  data,  as  x  =  (x\, . . . ,  xn),  where  each 
xt  is  known  to  you  as  being  one  or  zero.  In  view  of  this, 
the  left-hand  sides  of  Equations  (8)  to  (10)  should  be  re¬ 
written  as  7 t(0;  X[, . . . ,  x„,  7i)  and  P( X  =  1;  x\, . . . ,  x„,  TC) 
respectively.  But  more  significant  is  the  fact  that  the  quan¬ 
tity  P(Xi  =  Xi  |  9)  of  Equation  (8)  can  no  longer  be  inter¬ 
preted  as  a  probability.  This  is  because  the  notion  of  prob¬ 
ability  is  germane  only  for  events  that  have  yet  to  occur,  or 
for  events  that  have  occurred  but  whose  disposition  is  not 
known  to  you.  In  our  case,  X{  is  known  to  you  as  xj  =  1  or 
xj  =  0,  thus  P(Xi  =  Xi  |  0)  is  not  a  probability.  So  what  does 
the  quantity  P(Xt  =  Xi  \  0)  =  0Xi(\  -  <9)1_*s  with  X/  fixed  as 
zero  or  one,  and  $  unknown,  mean?  Similarly,  in  the  context 
of  Equation  (9)  with  r  =  Yfl  \  x*>  what  does  the  quantity: 

n 

]^{0*'(i -0)1-*1}  =  0r(i -ey-r,  (ii) 

r=i 

with  n  and  r  known,  but  0  unknown,  mean?  Note  that  r  is 
the  total  number  of  successes. 

As  a  function  of  0 ,  with  n  and  r  fixed,  the  quantity 
0r(\  —  0Y~T  is  called  the  likelihood  function  of  0\  it  is  de¬ 
noted,  £y(0;  n,  /*),  the  subscript,  which  will  henceforth  be 
dropped,  signaling  the  fact  that  like  probability,  the  like¬ 
lihood  function  is  also  personal.  Since  C(0\n>  r)  is  not  a 
probability,  the  likelihood  function,  even  though  it  is  de¬ 
rived  from  a  probability  model,  is  not  a  probability.  It  can 
be  viewed  as  a  function  that  assigns  weights  to  the  differ¬ 
ent  values  0  that  0  can  take,  in  the  light  of  the  known 
n  and  r;  these  latter  quantities  can  be  viewed  as  evidence. 
Thus,  the  likelihood  function  can  be  interpreted  as  a  func¬ 
tion  that  prescribes  the  weight  of  evidence  provided  by  the 
data  for  the  different  values  that  chance  0  can  take.  For  ex¬ 
ample,  with  n  =  r  =  1,  C(0;n  =  r  =  1)  =  0\  this  suggests— 
see  Fig.  1  that  with  n  =r  —  1,  more  weight  is  given  by 
the  likelihood  function  to  the  large  values  of  0  than  to  the 
smaller  values. 

To  summarize,  the  expression  P(Xj  =  x,  |  0)  =  0T,(1  - 
0){~Xt ,  specifics  a  probability  of  the  event  ( X ]  =  xi)  when 
Xi  is  unknown,  and  0  is  assumed  known;  whereas  with 
Xi  known  as  x,-,  it  specifies  a  likelihood  for  the  unknown 
0.  With  x  known,  Equation  (10)  when  correctly  written 
becomes: 

P(X=  l;xf70oc  [' 0(0r(l-0y  r)x7T(0;?{)d0.  (12) 
Jo 
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Fig.  1.  The  likelihood  function  with  n  =  r  =  1. 

Equation  (12)  is  interesting.  It  encapsulates,  as  we  read 
from  left  to  right,  the  four  notions  we  have  introduced  thus 
far:  personal  probability  (the  left-hand  side);  chance  (the 
parameter  0 );  the  likelihood  of  chance  (the  quantity  #r(l  — 
6)n~r)\  and  the  probability  of  chance  (the  quantity  tc(0\ H)). 

Note  also  that  the  right-hand  side  of  Equation  (12)  is 
the  expected  value  of  a  function  of  0,  namely,  the  function 
0r+1(l  —  0)n‘r.Thus,  we  may  say  that  the  effect  of  the  data 
x  is  to  change  your  bet  on  the  event  {X  =  1)  from  £y(0)  to 

£y(0r+l(i  -  ey~r). 


3.  Imprecise  surrogates:  motivation  for  vagueness 
and  belief 

In  Section  1  we  outlined  a  problem  that  is  the  focus  of  our 
discussion,  and  in  Section  2  we  added  a  feature  to  it  by 
bringing  in  the  role  of  data.  The  notions  used  in  Sections  1 
and  2  are  probability,  chance  and  likelihood.  Are  these  the 
only  ones  needed  to  address  all  problems  pertaining  to  un¬ 
certainty?  Are  there  circumstances  that  pose  a  challenge  to 
us  in  terms  of  being  able  to  lean  on  these  notions  alone?  If 
so,  what  are  these,  and  under  what  scenarios  do  we  need  to 
go  beyond  what  has  been  introduced  and  discussed?  The 
purpose  of  this  section  is  to  address  the  above  and  related 
questions.  But  first  we  bring  into  play  our  second  caveat 
and  explore  the  circumstances  under  which  the  notions  of 
probability,  chance  and  likelihood  will  suffice  to  address 
this  caveat.  The  caveat  in  question  pertains  to  the  presence 
or  not  of  detectable  anomalies  during  inspection,  quality 
control  and  other  diagnostic  testing  functions. 

3.1.  Anomalies:  A  surrogate  of  failure 

To  keep  our  discussion  simple,  suppose  that  in  order  to 
assess  your  uncertainty  about  the  event  (X  =  1),  you  have 
at  your  disposal  TL  and  also  a  knowledge  of  the  presence  or 
the  absence  of  a  detectable  anomaly.  An  anomaly  could  be  a 
visible  defect,  or  noticeable  damage,  or  some  other  suitable 
indicator  of  imperfection.  Anomalies  could  be  present  and 


Anomaly  Item  Survives 


(y  =  o)  (*  =  o) 

Fig.  2.  Effect  of  anomalies  on  survival 

yet  not  be  detected.  We  denote  the  presence  of  a  detected 
anomaly  by  letting  a  binary  variable  Y  take  the  value  one; 
the  absence  of  a  detectable  anomaly  by  letting  Y  =  0.  The 
presence  of  an  anomaly  does  not  necessarily  imply  that  X 
will  be  zero;  similarly,  its  absence  is  no  assurance  (to  you) 
that  X  will  be  one;  see  Fig.  2.  Rather,  like  the  X\ , . . . ,  Xn  of 
Section  2,  the  presence  or  absence  of  a  detectable  anomaly 
helps  you  sharpen  your  assessment  of  the  uncertainty  about 
(*=1). 

Suppose  then,  that  Y  =  y  has  been  observed,  with>>  =  1 
or  0,  and  that  you  are  required  to  assess  P(X  —  \;y,  H).  A 
simple  way  to  proceed  would  be  to  treat  y  as  a  part  of  H , 
and  upon  careful  introspection  specify: 

P(X  =  1  ;y,H)  =  p,  0  <p<  1, 

as  your  bet  on  the  event  (X  =  1).  The  p  above  is  like  the 
p  of  Section  1,  in  the  sense  that  if  p  =  /?,  then  y  has  had 
no  effect  on  your  disposition  about  (X  =  1).  There  is,  of 
course  a  more  systematic  way  to  incorporate  the  effect  of  y 
into  your  analysis,  and  this  involves  a  use  of  the  likelihood. 
To  see  how,  start  by  pondering  the  matter  of  assessing  your 
uncertainty  about  the  event  ( X  =  1),  in  the  light  of  H ,  were 
you  to  know  (but  do  not  know)  the  disposition  of  Y.  This  is 
what  was  also  done  in  Section  2.2.  That  is,  you  ask  yourself 
what  P(X  =  1 1  Y;7i)  should  be?  By  Bayes’  law: 

P(X=  1\Y;H)  a  P(Y  =  y\X  =  1  ;H)  x  P(X  =  \;Hf 

y  =  I  and  0.  For  P(X  =  l  ;7i)  you  may  use  your  p  of  Equa¬ 
tion  (2).  To  proceed  further,  you  need  to  specify  a  probabil¬ 
ity  model  for  Y,  conditional  on  {X  —  1).  That  is,  you  need 
to  specify  jP(K  =  \\X=  \;H)andP(Y  =  0\X  =  1;  ft);  this 
is  tantamount  to  specifying  a  joint  distribution  for  X  and 
Y ,  Once  this  can  be  done,  you  have: 

P{X  =  \  \Y;H)<x  P(Y  =  y\X  =  1  ;ft)x/>.  (13) 

However,  in  actuality,  Y  has  been  observed  as  y  -  1  or 
y  —  0.  Consequently,  Equation  (13)  becomes 

P(X  =  1  ;y9  H)  oc  L(X  =  1  ;y9  H)  x  p,  (14) 
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where  £{X  —  \\yyH)  is  your  likelihood  function  for  the 
unknown  event  {X  —  1)  in  the  light  of  the  evidence  y  and 
H.  The  probability  model  P(7  =  y\X  =  1  ;H)  helps  you 
specify  the  likelihood.  Equation  (14)  says  that  your  bet  on 
the  event  ( X  =  1)  in  the  light  of  y  and  7Y,  is  proportional 
to  your  bet  on  ( X  =  1)  based  on  7Y  alone,  multiplied  by 
your  likelihood.  The  approach  prescribed  above  is  more 
systematic  than  the  one  involving  the  specification  of  /? 
based  on  introspection  alone,  because  it  incorporates  the  p 
of  Equation  (2).  A  key  point  to  note  is  that  £(X  =  1  ;y,  7Y) 
is  the  likelihood  of  an  observable  event;  it  is  not  the  like¬ 
lihood  of  chance  0  discussed  in  Section  2.3.  Should  you 
prefer  to  work  with  the  likelihood  of  chance,  then  you  must 
introduce  chance  into  your  pondering.  To  do  so,  you  may 
proceed  as  follows: 

P(X=\\Y;7i)  =  f  P(X  =  1  |  0,  7;7Y)  x  jt(9\Y; H)d9y 
Jo 

which  extends  the  conversation  to  0,  as  was  done  to  arrive 
at  Equation  (3).  If  you  now  assume  that  (X  =  1)  is  inde¬ 
pendent  of  both  Y  and  7Y,  were  you  to  know  0,  and  assume 
Bernoulli’s  model,  then: 

P{X  =  1)7; 7Y)  =  f  9  x  n(9\Y;H)69.  (15) 

Jo 

But  by  Bayes’  law: 

n(9\Y;?{)  <x  P(Y  =  y  \  0\H)  x  jr(0;M).  (16) 

Consequently,  to  proceed  further,  you  need  to  specify  a 
probability  model  for  the  anomaly  7,  were  you  to  know 
0 ,  and  also  n(9;  7 Y),  an  entity  that  has  already  appeared  in 
Sections  1  and  2.  Since  Y  has  in  actuality  been  observed  (as 
y  —  1  or  y  =  0  ),  Equation  (16)  becomes: 

jr(9\yt  7Y)  a  C(9;yt  H)  x  n{9\H), 

where  £{9;y,  H)  is  the  likelihood  function  of  the  chance  0, 
in  the  light  of  7Y  and  evidence  about  the  anomaly  y.  With 
the  above  in  place  Equation  (15)  becomes: 

P(X=\;y>n)<x  f  9  x  £(9;yy7{)  x  7i(9;PC)d9. 

Jo 

To  compare  the  above  equation  with  Equation  (14)  (their 
left-hand  sides  are  the  same),  wc  note  that  since  p  —  £(©), 
Equation  (14)  may  also  be  written  as 

P(X  =  \;y,H)oc  [  9  x  £(X  =  \;y,K)x  n(9;K)dO. 

Jo 

The  last  two  equations  signal  the  fact  that  in  order  to 
incorporate  the  effect  of  the  detected  anomalies  into  the 
assessment  of  your  uncertainty  about  (Af  =  1),  you  should 
be  prepared  to  either  specify  the  likelihood  of  (X  —  1)  in 
the  light  of  jy  (and  7Y),  or  the  likelihood  of  9  in  the  fight  of 
y  (and  7Y),  whichever  is  more  convenient.  To  specify  these 
likelihoods,  you  may  want  to  specify  P(Y  =  y\X  =  1;7Y) 
or  P(Y  =  y  |  0;7Y),  probability  models  for  7,  were  you  to 


know  X  or  9,  respectively.  Of  these,  the  former  may  be  easier 
to  assess  than  the  latter,  since  it  is  based  only  on  observables. 
We  shall  therefore  focus  on  the  case  P(Y  =  y\X;H)y  and 
refer  to  it  as  a  postmortem  probability  model. 

3.2.  Eliciting  postmortem  probabilities:  Potential  obstacles 

The  material  of  Sections  1  and  2  required  of  you  the  spec¬ 
ification  of  P(X  =  x  |  0)  and  i x(9\ 7Y),  for  x  =  1  or  0.  For 
the  former,  Bernoulli’s  model  is  a  natural  choice;  for  the 
latter,  a  beta  density  with  parameters  a  and  is  a  choice 
with  much  flexibility.  Thus,  for  0  <  9  <  1: 

P(X  =  x\9)  =  9X(1  -  9)l~x , 

and 

n{0-,n)  =  n(9-,a,p)=^^ea-\l-9y-1. 

Coming  to  the  scenario  of  Section  3,  you  are  required 
to  specify  the  above,  and  also  a  model  for  the  postmortem 
probability  P(Y  —  y \X  =  x;  7Y),  for  x,  y  =  1  or  0.  The  lat¬ 
ter  could  pose  two  difficulties.  The  first  is  that  you  should 
be  able  to  probabilistically  relate  detectable  anomalies  and 
failure;  Fig.  2  with  the  direction  of  the  arrows  reversed 
could  provide  guidance.  The  second — a  bigger  problem — 
can  arise  because  of  the  fact  that  the  absence  or  the  presence 
of  any  trait  which  qualifies  as  an  anomaly  may  not  be  easily 
determined.  For  example,  both  a  surface  scratch  and  a  dent 
qualify  as  defects,  but  the  former  could  be  less  deleterious 
to  an  item’s  survival  than  the  latter.  Also,  at  what  point 
does  a  rough  scratch  get  labeled  as  a  dent?  The  classifica¬ 
tion  of  an  anomaly  is  therefore  not  crisp,  so  that  the  event 
“anomaly”  is  not  well  defined.  It  is  this  lack  of  crispness 
that  motivates  a  consideration  of  “vagueness”  as  another 
aspect  of  uncertainty  quantification;  more  on  this  will  be 
said  in  Section  4. 

One  manifestation  of  this  absence  of  crispness  is  that  re¬ 
sponses  to  questions  for  eliciting  postmortem  probabilities 
tend  to  be  unhelpful  The  following  two  responses  from  an 
actual  scenario  are  illustrative. 

1.  “If  the  unit  works,  there  is  a  less  than  20%  chance  that 
we  would  have  detected  an  anomaly.  If  it  does  not,  we 
would  be  seeing  something  20-40%  of  the  time.” 

2.  “If  it  works,  that  means  that  it  was  well  manufactured. 
If  it  does  not,  then  it  means  that  it  was  handled  poorly 
when  it  was  shipped.” 

Clearly,  pinning  down  postmortem  probabilities  from 
statements  like  the  two  above  is  not  possible.  At  best  state¬ 
ment  1  can  provide  bounds  on  the  postmortem  probabili¬ 
ties,  and  statement  2  has  no  probabilistic  content  whatso¬ 
ever.  Yet  statements  1  and  2  provide  information,  albeit  not 
in  the  form  required  by  the  calculus  of  probability. 

To  summarize,  as  long  as  the  event  “anomaly”  is  well  de¬ 
fined  so  that  one  is  able  to  precisely  specify  the  postmortem 
probabilities,  the  development  of  Section  3.1  can  be  used. 
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and  to  do  so  all  that  one  needs  are  the  notions  of  proba¬ 
bility,  chance  and  likelihood.  Once  difficulties  of  the  type 
discussed  above  come  into  play,  postmortem  probabilities 
cannot  be  elicited.  When  such  is  the  case,  the  notions  of 
“vagueness”  and  “belief”  enter  the  arena  of  uncertainty 
quantification.  We  emphasize  that  we  do  not  see  these  no¬ 
tions  as  a  prelude  to  supplanting  probability;  rather,  they 
enhance  probability  by  making  its  use  more  encompassing. 
However,  to  some,  like  Zadeh  (1978),  the  notion  of  vague¬ 
ness  invites  alternatives  to  probability,  a  matter  upon  which 
we  disagree. 


4.  Harnessing  vagueness:  Uncertainty  quantification 
under  imprecision 

What  do  we  mean  by  the  term  “vagueness”?  Is  it  synony¬ 
mous  with  the  term  “imprecision”?  How  do  vagueness  and 
imprecision  enter  the  arena  of  uncertainty  quantification? 
These  are  some  of  the  questions  that  we  aim  to  address 
in  this  section.  We  shall  use  the  scenario  of  anomalies  dis¬ 
cussed  in  Section  3  as  a  point  of  discussion. 

4.1.  Fuzzy  sets  and  the  uncertainty  of  classification 

As  a  preamble,  recall  that  in  Section  3.1,  Y  was  a  binary 
variable  taking  values y  =  0  ory  =  1,  with  Y  —  0(1)  denot¬ 
ing  the  absence  (presence)  of  a  detectable  anomaly.  Declar¬ 
ing  that  Y  =  0  or  1  is  often  a  judgment  call,  which  does  not 
encapsulate  the  degree  of  the  anomaly.  In  this  seetion  wc 
refine  the  above  process  by  introducing  some  granularity  to 
the  values  y  that  Y  can  take.  To  do  so,  we  let  Y  denote  some 
undesirable  characteristic  of  the  item  in  question  that  can 
be  quantified — for  instance  the  depth  of  a  scratch  and  al¬ 
low  Y  to  take  a  continuum  of  values  y  in  some  well-defined 
range,  say  1Z  =  [0,  M ],  where  M  is  specified.  Let  A ,  a  subset 
of  7£,  be  the  set  of  all  ys  that  lead  to  the  assessment  that  the 
item  in  question  has  an  anomaly.  Now  if  there  exists  a  value 
y *  such  that  for  any  y  >  y  *  an  anomaly  is  declared,  then  A 
is  called  a  crisp  (or  a  sharp)  set;  crisp  to  reflect  the  fact  that 
A  has  well-defined  boundaries.  Consequently,  any  y  can  be 
placed  with  precision  in  the  set  A ,  or  its  complement.  Crisp 
sets  are  said  to  adhere  to  the  law  of  the  excluded  middle , 
in  the  sense  that  any  y  cither  does  belong  or  docs  not  be¬ 
long  to  A.  However,  if  it  is  not  possible  to  identify  a  y*  of 
the  kind  described  above,  then  a  boundary  of  A  is  not  well 
defined.  Consequently,  we  are  unable  to  classify  the  mem¬ 
bership  of  certain  ys  in  A  with  definitiveness  (or  precision). 
Such  ys  can  simultaneously  belong  and  not  belong  to  A. 
Sets  which  exhibit  the  property  of  having  boundaries  that 
are  not  sharp  are  said  to  be  fuzzy .  Fuzzy  sets  do  not  ad¬ 
here  to  the  law  of  the  excluded  middle.  In  the  context  of  the 
scenario  considered  here,  one  may  not  be  able  to  classify, 
with  definiteness,  certain  defects  as  being  anomalies.  That 
is,  there  could  arise,  in  practice,  scenarios  in  which  there  is 
an  uncertainty  (in  a  subject  matter  specialist’s  mind)  about 


classifying  a  defect  as  being  an  anomaly  or  not,  and  also 
an  unwillingness  (of  the  specialist)  to  assign  probabilities 
to  the  uncertainty  of  classification. 

To  summarize,  fuzzy  sets  are  those  whose  boundaries  are 
not  well  defined,  and  imprecision  pertains  to  an  inability 
to  place  with  certainty  every  element  of  a  set,  such  as  7Zy 
into  its  fuzzy  subset  such  as  A .  That  is,  imprecision  is  a 
consequence  of  vagueness. 

The  Kolmogorov  axiomatization  of  probability  is  devel¬ 
oped  on  the  premise  that  probability  measures  be  defined  on 
sharp  sets  ((c.f.  Billingsley  (1985),  p.  20)).  Thus,  the  appear¬ 
ance  of  fuzzy  sets  requires  of  us  ways  to  develop  approaches 
whereby  probabilities  can  be  endowed  to  fuzzy  sets  as  well. 
A  strategy  for  doing  so  is  via  the  introduction  of  “mem¬ 
bership  functions”  which,  though  not  probabilistic,  can  be 
seen  as  a  subject  matter  specialist’s  classification  “probabil¬ 
ities.”  Membership  functions  are  discussed  in  Section  4.2 
and  their  use  for  inducing  probabilities  on  fuzzy  sets  dis¬ 
cussed  in  Section  4.3.  As  a  final  reminder,  it  is  important 
to  keep  in  mind  that  the  material  of  Sections  4.2  and  4.3 
will  not  come  into  play  if  the  event  “anomaly”  can  be  well 
defined. 


4.2.  The  membership  function  of  a  fuzzy  set 

The  membership  function  of  a  fuzzy  set  A  encapsulates  the 
degree  to  which  any  y  e  7£  belongs  to  A.  It  is  denoted  by 
for  every  y.  It  is  important  to  note  that  jx  fly)  is  not 
a  probability,  because  /^(y)  need  not  be  one;  however, 
it  is  often  the  case  that  0  <  fx^ly)  <  l,  for  all  y .  Opera¬ 
tions  with  fuzzy  sets,  such  as  unions,  intersections  and  com¬ 
plements  are  facilitated  by  the  membership  function.  Like 
probability,  the  membership  function  is  subjectively  speci¬ 
fied,  and  may  change  from  person  to  person.  The  member¬ 
ship  function  of  a  crisp  set  is  an  identity  function;  i.e.,  if  A  is 
a  crisp  set,  then  fx^ly)  =  0  for  y  <  y*  and  jx^iy)  =  1 ,  oth¬ 
erwise.  For  the  scenario  of  anomalies  considered  here,  with 
y  encapsulating  the  magnitude  of  a  defect,  fx^(y)  would  be 
of  the  form  illustrated  in  Fig.  3.  Small  values  of  y  would 
certainly  not  be  viewed  as  an  anomaly  and  large  values  cer¬ 
tainly  would.  For  the  intermediate  values  of  y ,  fx^iy)  shows 
the  extent  to  which  y  would  be  judged  (by  one  particular 
individual)  to  be  an  anomaly. 

4.3.  Endowing  probabilities  to  fuzzy  sets 

By  endowing  probabilities  to  fuzzy  sets  wc  mean  assess¬ 
ing  our  personal  probability  that  T  belongs  to  A  in  the 
fight  of  the  membership  function  fx^{y).  For  this  wc  first 
need  to  assess  our  personal  probability  that  T  reveals  it¬ 
self  as  y — that  is  our  probability  that  the  outcome  of  Y 
is  y — and  our  personal  probability  that  the  revealed  y  be¬ 
longs  to  A.  Supposing  Y  to  take  discrete  values,  we  de¬ 
note  the  above  personal  probabilities  by  Py(Y  =  y)  and 
Py(y  6  A)  respectively.  The  need  for  this  latter  probability 
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Fig.  3.  Membership  function  of  a  fuzzy  set  A. 
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Equation  (19)  forms  the  basis  of  assessing  the  item’s  sur¬ 
vival  probability  when  the  presence  of  an  anomaly  is  ac¬ 
tually  declared,  but  not  the  extent  of  the  defect  that  is  be¬ 
lieved  to  result  in  an  anomaly.  That  is,  we  are  not  given  the 
value  ofy.  In  this  case  P(Y  e  A\X  =  1;  fi^(y))  is  viewed  as 
the  likelihood  and  the  left-hand  side  of  Equation  (19)  be¬ 
comes  P( X  =  1;Y  e  A,  H),  the  required  probability.  Con¬ 
sequently,  Equation  (19)  leads  us  to 

P(X=  \;Y  Gi,7i)o(£(I=  \;Y  e  A,nA(y))xp9 

(20) 

which  is  our  personal  probability  that  (A"  =  1),  given  the 
presence  of  an  anomaly  that  is  vaguely  specified. 


entails  a  philosophical  argument  whose  roots  can  be  traced 
to  Laplaee.  By  interpreting  a$  a  likelihood  function 
and  invoking  Bayes’  law,  Singpurwalla  and  Booker  (2004) 
go  through  some  standard  technical  manipulations  to  eval¬ 
uate  the  constants  of  proportionality  and  to  argue  that: 


Py(Y  6  A‘fjLA(y)) 


-E 


i  + 


\  -  MjO) 
MiO) 
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JV(r  i  j)l 

Pyiy  €  A)  \ 


Py(Y=y). 


(17) 


See  Equation  (10)  of  Singpurwalla  and  Booker  (2003). 


4.4.  Assessing  failure  probability  with  imprecisely 
specified  anomalies 

With  Equation  (1 7)  in  place,  it  is  a  relatively  straightforward 
matter  to  obtain  an  analogue  of  the  postmortem  probability 
when  the  classification  of  anomalies  is  imprecise,  as 


P(  Y  zA\X;pA{y)) 


=  E 


i  + 


1  -  MjP) 

MiO) 
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pjy  j  Ay 
Piy  e  A)_ 


P(Y^y \n  (18) 


where  for  convenience  the  subscripts  associated  with  all  the 
Ps  have  been  omitted.  The  key  difference  between  Equa¬ 
tions  (17)  and  (18)  is  in  the  last  term.  The  former  entails  an 
unconditional  probability  for  Y;  the  latter,  a  conditional 
probability  that  Y  reveals  itself  as  y ,  given  X,  the  dispo¬ 
sition  of  an  item’s  status — surviving  or  failed.  Note  that 
P(  Y  =  y  |  X)  is  like  the  postmortem  probability  of  Sec¬ 
tion  3.1,  save  for  the  fact  that  Y  can  now  take  a  range  of 
values  y,  instead  of  it  being  zero  or  one. 

To  assess  an  item’s  survival  probability  were  an  impre¬ 
cisely  specified  anomaly  to  be  declared  as  Y  €  A,  wc  con¬ 
sider  the  analogue  of  Equation  (13).  Specifically,  we  have: 

P(X  =  1|  Y  €  A\ H)  a  P(Y  e  A  \  X  =  1;m^O))  x  p, 

(19) 

where  the  middle  term  is  given  by  Equation  (18),  and  as 
before,  p  is  our  prior  probability  that  (X  =  1). 


5.  A  reason  to  believe 

Sections  3  and  4  required  of  us  the  specification  of  a  con¬ 
ditional  probability  P(Y  =  y  \  X  =  jc;  Tt)  and  the  member¬ 
ship  function  y  e  [0,  AT),  as  a  way  of  dealing  with 

vagueness  and  anomalies.  What  if  vagueness  and  other  rea¬ 
sons  create  an  unwillingness  to  specify  the  conditional  prob¬ 
ability  but  a  willingness  to  specify  a  marginal  probability 

p(y  =  y\7iy> 

The  notion  of  “belief’'  was  introduced  by  Dempster 
(1967)  as  a  way  of  dealing  with  such  partial  specifications. 
Dempster’s  development  is  articulated  via  a  key  feature  of 
axiomatic  probability  theory,  namely,  that  in  order  to  in¬ 
duce  probability  measures  from  a  probability  measure  space 
to  another  measure  space  it  is  necessary  that  the  mapping 
from  the  former  to  the  latter  be  a  many-to-one  map.  As  an 
example,  a  random  variable  is  a  many-to-onc  map.  Conse¬ 
quently,  its  probability  distribution  function  can  be  induced 
from  the  probability  measure  space  on  which  the  random 
variable  is  defined.  When  the  mapping  is  a  one-to-many 
map — as  is  the  case  with  our  anomaly  (see  Fig.  2) — the  in¬ 
duced  measure  will  no  more  be  a  probability  measure.  For 
a  more  detailed  appreciation  of  this  argument,  we  refer  the 
reader  to  Wasserman’s  (1990)  excellent  exposition;  parts  of 
it  are  reproduced  in  the  Appendix.  The  induced  measure  not 
being  a  probability  measure,  alternate  labels  for  it  become 
germane.  Dempster’s  choice  of  a  label  is  Basic  Probability 
Assignment  (BPA). 

With  respect  to  the  problem  at  hand,  suppose  that  we  are 
able  to  elicit  personal  probabilities  of  the  type  P(Y  =  y ;  H ), 
y  =  1  or  0,  as p&  and  (1  —  /?a)  respectively.  Given /?a>  and  the 
mapping  of  Fig.  2,  how  may  we  describe  our  uncertainty 
about  the  survival  (or  failure)  of  the  item  to  time  r?  That 
is,  how  may  we  express  our  uncertainty  about  the  event 
{X  =  x)  for  x  =  1  orO? 

The  “belief  function”  approach  of  Dempster  starts  by 
noting  that  the  mapping  from  Y  =  y  to  X  =  x  is  a  one-to- 
many  map.  In  particular,  if  T  denotes  the  mapping  from  the 
T-space  to  the  X-space,  then  F(  Y  =  1)  =  [X  =  1,  X  =  0}. 
That  is,  the  singleton  (Y  =  1 )  maps  into  the  set  {X  =  1,  X  = 
0}  via  the  map  f;  in  other  words,  T  is  a  set- valued  map, 
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similarly  with  r(F  =  0).  However,  in  order  to  make  the 
essence  of  our  development  more  transparent,  we  suppose 
that  T(K  =  0)  =  (X  —  1).  This  means  that  the  absence  of 
an  anomaly  is  tantamount  to  the  item’s  success.  In  other 
words,  the  mapping  from  Y  =  0  to  the  A'-space  is  a  one-to- 
one  map.  Consequently,  in  Fig.  2,  the  arc  joining  the  nodes 
( Y  =  0)  and  ( X  =  0)  needs  to  be  removed. 

With  the  above  in  place,  the  next  step  in  the  development 
of  the  belief  function  approach  is  to  induce  measures  of  un¬ 
certainty  from  the  y-space  to  the  X -space.  Recall,  that  it  is 
only  the  y-space  that  has  been  endowed  with  probability  as 
the  measure  of  uncertainty.  Since  the  A-space  has  only  two 
elements,  (X  =  1)  and  (X  =  0),  T(X),  the  measure  space 
(i.e.,  the  set  of  all  sets)  generated  by  X,  has  four  elements, 
namely: 

m  =  {{*},  {X  =  1 },  [X  -  0},  [X  =  1 ,  X  =  0}}. 

With  r(y=  \)  =  {X  =  \,X  =  0]  and  r(Y  =  0)  =  (X  = 
1),  the  induced  measure,  say  m,  on  T(X)  will  be  of  the 
form:  m{<t>)  =  0,  m(X  =  1)  =  P(Y  =  0)  =  1  -pz,  m(X  = 
0)  =  0  and  m{X  =  1 ,  X  =  0}  =  P{Y  =  1)  =  />a.  Recall  that 
in  Dempster’s  terminology,  the  w(#)s  constitute  a  BPA.  It  is 
easy  to  verify  that  m  possesses  the  following  two  properties: 
m(<f>)  =  0,and  for  F  e  F(X),  m(F)  =  1  •  However, 

m  is  not  countably  additive  and  thus  is  not  a  probability 
measure.  To  make  m  a  probability  measure  we  should  be 
prepared  to  apportion  pz  between  the  events  (X  =  1)  and 
(2T  =  0). 

Once  the  BPAs  are  in  place,  the  belief  function  induced 
by  the  map  V  on  F(X)  is  defined  ,  for  any  F,  G  c  F[X)  as 

bel(r)  =  £>(G), 

CCF 

and  bel(F)  is  then  considered  as  a  quantified  measure  of 
uncertainty  about  F.  Thus  for  our  problem  at  hand  be^A"  = 
1)  =  1  - p a,  whereas  be\(X  =  0)  =  0;  also,  be\{X  =  \,X  = 
0}  =  1  -/>*. 

Dempster  has  also  introduced  the  dual  of  the  belief 
function,  called  the  plausibility  function ,  where  for  any 
F  e  F{X): 

pl (F)  =  1  -  b el(n; 

F*  is  the  complement  of  F.  For  our  problem  at  hand  p](X  = 
1)  =  1,  whereas  pl(A"  =  0)  =  /?a. 

To  make  these  ideas  operational,  that  is,  to  make  a  prag¬ 
matic  use  of  them,  we  need  to  interpret  bel(»)  and  pl(»). 
Using  bets,  bel(A"  =  1)  is  the  most  you  are  willing  to  pay 
for  a  bet  on  (X  =  1):  if  be^A"  =  1)  =  1  —  pa,  you  arc  will¬ 
ing  to  pay  at  most  1  —  pa  to  receive  one  monetary  unit  if 
(X  =  1).  pHA'  =  1)  is  (1 — the  most  you  are  willing  to  pay 
for  a  bet  on  ( X  =  l)c):  if  pl(A"  =  1)  =  1,  you  are  not  willing 
to  pay  anything  to  bet  on  (A'  =  l)c  =  (X  =  0).  However,  as 
pointed  out  by  a  referee,  Walley  (1991)  has  argued  that  it  is 
misleading  to  interpret  the  belief  and  plausibility  functions 
as  betting  rates.' 


5.1.  Summarizing  “beliefs” 

By  way  of  a  closure,  we  claim  that  the  notion  of  belief,  or  its 
dual  plausibility,  comes  into  play  when  joint  probabilities  of 
the  type  P{  Y  =  1 ,  X  —  1;  H)  cannot  be  elicited,  and  when 
the  marginal  probabilities  of  the  type  P(Y  =  U7i)  =  pa 
cannot  be  apportioned  in  a  one-to-many  map.  Intuitively, 
the  uncertainty  measure  bel(#)  seems  reasonable;  it  can  be 
seen  as  a  lower  bound  on  probability.  When  the  mapping 
under  discussion  is  a  one-to-one  or  a  many-to-one,  belief 
and  probability  agree,  and  thus  the  belief  function  will  obey 
the  rules  of  probability.  We  may  conclude  by  saying  that 
there  is  a  price  to  be  paid  for  not  being  able  to  elicit  the 
required  conditional  probabilities,  and  the  price  is  to  for¬ 
sake  the  notion  of  probability  and  its  accompanying  virtues. 
Dempster  has  also  proposed  rules  for  combining  uncertain¬ 
ties,  the  details  about  which  can  be  found  in  Shafer  (1976) 
or  in  Wasserman  (1990). 
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Appendix 

Belief  and  plausibility 

In  order  to  gain  an  appreciation  of  the  notion  of  “belief” 
and  its  dual  “plausibility,”  it  is  best  that  we  start  off  with  a 


Probability,  chance  and  the  probability  of  chance 

look  at  the  essentials  of  how  to  measure  theoretic  probabil¬ 
ity.  This  we  do  below  via  the  following  seven  steps,  each  of 
which  serves  as  a  prelude  to  the  next  step.  We  assume  of  the 
reader  some  familiarity  with  these  steps.  From  Step  8  and 
onwards,  our  discussion  highlights  arguments  necessary  to 
motivate  the  notions  of  belief  and  plausibility. 

Step  1.  Let  (ft,  :F(ft),  p)  be  a  probability  measure  space, 
with  (o  as  an  element  of  ft,  and  p  assessed  for  all 
members  A  of  .Z(ft). 

Step  2.  Let  (X,  !F(X))  be  some  measure  space  with  x  as 
an  element  of  X.  This  is  our  space  of  interest. 

Step  3.  Let  B  c  X;  since  P(X)  is  a  a -field  generated  by 
Xy  B  e  T(X). 

Step  4.  Our  aim  is  to  endow  the  space  (. X ,  F(X))  with  a 
measure  that  encapsulates  our  uncertainty  about 
any  By  where  B  C  X,  or  about  a  singleton  x,  where 
x  e  Xy  should  X  have  countable  elements.  Ideally, 
our  measure  of  uncertainty  should  be  a  probabil¬ 
ity. 

Step  5.  The  measure  that  we  endeavor  to  endow 
( X,  J~(X))  with,  should  bear  some  relationship  to 
the  measure  p.  This  is  because  we  have  been  able 
to  assess  probabilities  on  the  space  (ft,  ,Z(ft));  i.e., 
we  are  prepared  to  place  bets  only  on  members  of 

mi 

Step  6.  In  order  to  be  able  to  do  the  above,  we  should 
connect  the  spaces  (ft,  .Z(ft),  p)  and  (X,  T(X)). 
This  connection  can  be  made  in  several  ways,  two 
of  which  are  indicated  below: 

(i)  a  mapping  from  ft  as  the  domain,  to  X  as  the 
range,  or 

(ii)  a  mapping  from  ft  as  the  domain,  to  P(X)  as 
the  range. 

Step  7  The  standard  approach  is  6  (i)  above;  this  is  what 
leads  us  to  the  notion  of  a  real- valued  random  vari¬ 
able ,  say  Z. 

Specifically,  we  take  X  to  be  the  real  line 
R,  or  a  countably  infinite  set  of  integers  /  = 
{0,  ±1,  ±2, . . .},  or  a  countably  finite  set  of 
integers  I#  =  {0,  ±1, . . . ,  ±JV}.  When  X  =  R, 
T(X)  —  B(X) — the  Borel  sets  ofR.  WhenA'  =  IN , 
then  P(X)  is  the  power  set  of  In. 

Suppose  that  X  =  R.  Then  Z  is  a  mapping  with 
domain  ft  and  range  R.  Furthermore,  Z  is  a  many- 
to-one  map  from  ft  to  R.  Specifically,  for  every  a>  e 
ft,  there  is  one  and  only  one  Z(w),  and  Z(w)  €  R. 
However,  we  do  allow  for  the  possibility  that  for 
any  two  (or  more)  co\ ,  a>i  e  ft ,  Z(wi)  —  Z(a>i). 

Now,  a  (fortunate)  consequence  of  the  many- 
to-one  map  Z  is  that  such  a  map  is  able  to  induce 
a  probability  measure,  say  p*y  on  ( X ,  P(X))  (or  to 
put  it  more  correctly  on  (R  <  Specifically, 

for  any  a  e  R,  the  set  (Z(w)  <  a)  e  Z'(A'),  and 

p*(Z(o))  <  a)  —  p{co  €  ft  :  Z(w)  <  a }, 
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is  a  probability  measure  of  the  set  (Z(w)  <  a). 
Consequently,  we  now  have  a  probability  measure 
space  (Xy  T(X),  pi*)  in  addition  to  our  original 
probability  measure  space  (ft,  Z*(ft),  p). 

Thus  with  a  many-to-one  map,  we  are  able  to 
describe  our  uncertainties  about  events  of  interest 
in  P(X)  via  a  probability  p*y  with  p*  being  based 
on  p. 

Step  8.  Suppose  now  that  the  connection  between  the 
spaces  (ft,  .Z(ft),  p)  and  (Xy  P(X))  is  established 
via  a  mapping  T  whose  domain  is  ft  (as  before) 
but  whose  range  is  P(X7)  instead  of  X.  That  is, 
ftF->.F(Ar).  More  specifically,  for  every  w  e  ft, 
r(<o)  =  By  where  B  e  F(X). 

If  we  assume  that  the  above  mapping  is  many- 
to-one,  in  the  sense  that  every  weft  gets  mapped 
to  one  and  only  one  set  B  (where  B  may  or  may 
not  be  a  singleton),  then  this  mapping  is  known  as 
a  many-to-one  set-valued  map.  When  such  is  the 
case  r  is  also  able  to  induce  a  probability  measure, 
say  p**y  on  the  space  (P(X)y  T(IF(. X )),  p**\  where 
T(!F(Xf)  is  a  a-field  of  sets  generated  by  P(X). 
Consequently,  for  any  set  C  e  P(!F(X))\ 

p**(Q  —  p{co  e  ft  :  P(<t>)  =  C). 

Thus,  to  summarize,  a  many-to-one  set-valued 
map  is  also  able  to  induce  a  probability  measure 
p**  on  the  space  (P(X),  Jr(J7(Xy))y  assuming  that 
the  latter  space  is  of  interest  to  us.  But  what  about 
the  space  (X,  T(X))1  This  after  all,  is  our  space  of 
interest. 

Step  9.  The  fact  that  V  is  a  many-to-one  set- valued  map 
on  P(X)  is  tantamount  to  the  fact  that  T  is  a  many- 
to-many  point- valued  map  on  X,  In  particular,  if 
X  —  R  and  T(X)  =  B(R),  then  T  is  a  many-to¬ 
rn  any  real-valued  map  on  R.  Consequently,  for 
every  weft,  F(w)  can  take  any  and  all  values 
in  an  interval,  say  Z,  where  Z  e  B(R).  Inducing 
a  probability  measure  on  Z  or  any  subset  of  Z 
boils  down  to  smearing  p(co)y  the  probability  mea¬ 
sure  on  w,  over  Z.  How  should  this  measure  be 
smeared?  What  if  one  is  unwilling  to  specify  a 
strategy  for  smearing  (or  distributing)  p(co)  over 
Z?  When  such  is  the  case  we  are  unable  to  induce  a 
probability  measure  from  the  space  (ft,  Z*(ft),  p) 
to  (Xy  T(X)).  As  a  consequence,  an  alternative 
measure  called  plausibility ,  abbreviated  pl(®),  has 
been  proposed  on  P(X).  But  before  examining 
pl(#),  it  may  be  useful  to  better  articulate  this  mat¬ 
ter  of  smearing  p(co)  by  looking  at  a  special  case 
of  I,  namely  an  Z  consisting  of  a  countable  num¬ 
ber  of  elements,  say  two;  denote  these  by  {xi,  X2}. 
Suppose  that  T  1  {xj ,  X2}  =  w;  then  p((o)  is  the  in¬ 
duced  probability  measure  of  {xi,X2}.  However, 
to  induce  a  probability  measure  on  xj  or  X2,  we 
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need  to  split  (apportion)  p.(co)  in  some  logical  and 
meaningful  manner. 

To  summarize,  whenever  the  map  connecting 
two  measure  spaces  is  a  many-to-one  set-valued 
map,  or  a  many-to-one  point-valued  map,  a  prob¬ 
ability  measure  can  always  be  induced  from  the 
domain  space  to  the  range  space.  Probability  mea¬ 
sures  cannot  be  induced  when  the  mapping  is  a 
one-to-many,  or  a  many-to-many,  point-valued 
map,  unless  additional  assumptions  are  made. 
When  such  assumptions  cannot  be  made,  a  com¬ 
promise  has  to  be  struck  and  upper  and  lower 
probabilities  enter  the  foray  of  uncertainty  assess¬ 
ment.  These  are  discussed  below. 

Step  1 0.  Consider  the  subset  B  of  X.  Suppose  that  there 
does  not  exist  an  induced  probability  measure 
from  (SI,  JF(fi),  t±)  to  B.  That  is,  fi  and  co  e  SI, 
such  that  r(<w)  =  B. 

Now  consider  a  set  C  e  T(T(X))  with  the  fea¬ 
ture  that  CC\B  ^<p\  suppose  that  C  is  the  only 
set  in  T(T(X))  that  intersects  with  B.  Since  C  c 
T(F(X)),  pc**(C)  is  known.  Let  <o\ ,  ct>2, . . . ,  <on  be 
such  that  r(ft>/)  =  C,  1  =  1 n.  Then,  the  plau¬ 
sibility  of  B,  denoted  pl(£)  is  the  (probability) 
measure  pl(£)  =  pl{<o\  , .  . ,  <o„).  Alternatively  put 

pl(^)  —  p\co  €  F(w)  =  C  and  Bn  C  ^  <p}. 

The  above  expression  generalizes  when  more  than 
one  set  intersects  B  For  example,  suppose  that 
n  C/  /  for  /  =  1  , ,k,  with  C,  e  X )). 

Then: 

pl(i?)  =  (a{q)  e  SI;  T(co)  =  C,  and 
Bf)  Q  7^  0 ,  i  —  1 , . . . ,  k } . 

Since  there  are  several  sets  Q  that  intersect  with 
B,  there  are  overlapping  cos  in  the  definition  of 
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pl(JS).  Consequently,  it  is  also  called  an  “upper 
probability.” 

Step  11.  A  notion  dual  to  pl(«)  in  a  sense  to  be  explained 
later — is  bel(#);  here 

bel (B)=pl{<d  g  SI;  r(co)-Ch  Q  c  B ,  /—I . k). 

Bel(Z?)  is  a  lower  probability,  with  0  <  bel(i?)  < 
pi (B)  <  1.  Also,  bel(£)  =  1  -  pi (B0). 

The  measures  pl(»)  and  bel(#)  are  not  probabil¬ 
ity  measures  in  the  sense  that: 

bel  (A  UB)>  bel(zf)  +  bel  (B); 

i.e.,  because  of  an  overlap  of  cos,  bel(#)  is  super¬ 
additive. 
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Summary 

The  state  of  the  art  in  coherent  structure  theory  is  driven  by  two  assertions,  both  of  which  are 
limiting;  (1)  all  units  of  a  system  can  exist  in  one  of  two  states,  failed  or  functioning;  and  (2)  at 
any  point  in  time,  each  unit  can  exist  in  only  one  of  the  above  states.  In  actuality,  units  can  exist  in 
more  than  two  states,  and  it  is  possible  that  a  unit  can  simultaneously  exist  in  more  than  one  state. 
This  latter  feature  is  a  consequence  of  the  view  that  it  may  not  be  possible  to  precisely  define  the 
subsets  of  a  set  of  states;  such  subsets  are  called  vague .  The  first  limitation  has  been  addressed  via 
work  labeled  ‘multistate  systems’;  however,  this  work  has  not  capitalized  on  the  mathematics  of 
many-valued  propositions  in  logic.  Here,  we  invoke  its  truth  tables  to  define  the  structure  function 
of  multistate  systems  and  then  harness  our  results  in  the  context  of  vagueness.  A  key  contribution 
of  this  paper  is  to  argue  that  many-valued  logic  is  a  common  platform  for  studying  both  multistate 
and  vague  systems  but,  to  do  so,  it  is  necessary  to  lean  on  several  principles  of  statistical  inference. 

Key  words:  Consistency  profile;  likelihood  function;  membership  functions;  reliability;  probability; 
maintenance  management;  natural  language;  degradation  modelmg;  decision  making  and  utility. 


1  Introduction  and  Overview 

The  calculus  of  coherent  systems,  innovated  by  Bimbaum  et  al.  (1961)  has  served  as  a  v 
mathematical  foundation  for  a  theory  of  systems.  Here,  one  explores  the  effect  that  a  system’s 
components  have  on  the  system.  The  bulk  of  the  effort,  however,  has  been  devoted  to  the  case 
of  binary  states  with  precise  classification.  That  is,  the  components  and  the  system  can  (at  any 
point  in  time)  be  in  one  of  two  unambiguously  defined  states,  functioning  or  failed.  In  actuality, 
items  can  function  in  degraded  states,  and  these  could  be  a  discrete  set  or  a  continuum  of  states. 
An  example  of  the  former  is  a  load-sharing  system,  like  a  transmission  line  for  power  with  r 
strands.  As  the  strands  break,  the  rope  transitions  from  its  ideal  load  carrying  capability  to  its 
complete  disintegration  (Smith,  1983).  An  example  of  the  latter  is  a  precipitator  for  reducing  air 
pollution  whose  cleaning  efficiency  ranges  from  (almost)  100  to  0%  (Matland  &  Singpurwalla, 
1981).  Systems  that  can  exist  in  more  than  two  states  are  called  multistate  systems. 

There  are  two  interrelated  aims  to  this  paper.  The  first  is  to  contribute  to  the  mathematics 
of  multistate  systems  with  precise  classification  via  many-valued  logic.  To  set  the  stage  for 
this,  we  overview  some  key  notions  and  results  in  the  reliability  theory  of  binary  systems. 

©  2008  The  Authors.  Journal  compilation  ©  2008  International  Statistical  Institute.  Published  by  Blackwell  Publishing  Ltd,  9600  Garsington  Road, 
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Section  1.2  is  archival;  however,  Section  1.3  is  current  in  the  sense  that  it  incorporates  the  view 
that,  when  discussing  system  reliability,  one  needs  to  distinguish  between  probability  (which  is 
personal)  and  propensity  (which  is  physical),  and  that  the  assumption  of  the  independence  is 
conditional  upon  propensities.  The  second  aim  of  this  paper  is  to  argue  that  multivalued  logic  also 
provides  a  framework  for  assessing  the  reliability  of  binary  or  multistate  systems  with  imprecise 
classification.  Imprecision  (or  vagueness)  is  articulated  in  Section  1 .4;  Section  1 .5  is  a  guide  to 
the  rest  of  this  paper. 

LI  Preamble:  Notation  and  Terminology 

Consider  a  system  with  n  components.  The  system  and  each  of  its  components  can  exist  in 
several  states  in  S  C  [0,  1].  Let  X },  /  =  1,  . . .  ,  n  denote  the  state  of  component  /  at  time  r  > 
0,  and  denote  X  =  (Xu  . . .  ,  Xn).  Binary  systems  are  those  for  which  S  =  {0,  1 },  where  1  (0) 
denotes  a  functioning  (failed)  state.  The  state  of  the  system  is  a  function  of  X,  called  the  ‘structure 
function’.  We  denote  by  <p(X)  the  structure  function  for  a  binary  system.  The  structure  function 
for  a  system  with  multiple  states  will  be  denoted  by  0(X).  We  assume  that  the  component  and 
system  states  belong  to  the  same  set  S ;  e.g.  Xi  €  S  and  <p(X )  €  S.  However,  it  is  possible  that 
the  XT s  belong  to  [0,1]  whereas  (p(X )  can  only  take  values  in  {0,1}. 

1.2  The  Calculus  of  Binary  Systems  with  Precise  Classification 

The  following  is  an  overview  of  the  calculus  of  binary  systems  (Barlow  &  Proschan,  1975); 
we  generalize  this  construction  in  Sections  3  and  4.  Let  <S  =  {0,  1 }  with  X(  —  1  (0)  if  component 
i  functions  (fails),  i  —  9  n;  similarly,  (p(X)  :  Sn  5  equals  1  (0)  if  the  system  functions 

(fails),  cp  is  a  binary  coherent  system  if  (1)  (f>  is  non-decrcasing  in  each  argument  of  X,  and  (2) 
each  component  is  relevant.  Examples  of  binary  coherent  systems  are  a  series  system,  a  parallel 
redundant  system,  and  a  fc-out-of-«  system.  The  dual  of  a  binary  coherent  system  cp(X )  is  defined 
as  (pD(X )  =  1  -  0(1  —  X),  where  1  —  X  =  (1  —  X\y  1  —  Xi> . .  • ,  1  —  Xn ).  Any  binary  structure 
function  cp  with  n  components  can  be  decomposed  as  0(X )  =  X\  0(1,*,  X)  +  (1  —  X,)  0(0, ,  X),  for 
all  X,  i  =  1, . . . ,  n;  this  is  later  referred  to  as  the  pivotal  decomposition.  The  following  notation, 
definitions  and  theorems  arc  conventional  (Barlow  &  Proschan,  1975): 

X  Y  =  (Xl  YuX2-Yly  Xn-YH\ 

XIIY  =  (XxU  Yu  X2UY2,  XnU  Yn\ 

where X\  LI  Yx  =  1  -  (1  -  A/)(  1  -  Yt\i  =  1,  2, . . .  ,  n. 

Theorem  1:  For  any  binary  coherent  system  (p,  (ps(X)  =  ri/-i  5  0(X)  <  j_J”=1  X-t  ^ 
0p(X). 

Theorem  2:  For  any  binary  coherent  system  <p , 

<p(X  U  Y)  >  0(X)  LI  0(Y)  (1) 

and 

<p(X  •  Y)  <  <p(X)  •  0(Y),  (2) 

with  equality  holding  in  equation  (1)  (equation  2)  if  and  only  if  the  structure  function  (p  is 
0p(05)-  Proofs  of  Theorems  1  and  2  can  be  found  in  Barlow  &  Proschan  (1975). 


International  Statistical  Review  (2008),  76, 2, 247  2G7 

©  2008  The  Authors.  Journal  compilation  O  2008  International  Statistical  Institute 


Many-valued  Logic  in  Multistate  and  Vague  Stochastic  Systems 


249 


1.3  Reliability  of  Binary  Systems 

Suppose  that  then’s  are  exchangeable,  and  that/?,  is  the  propensity  of  Xi  being  l;thatis,/?f  = 
lim^oo  ^,'~1  — -  [cf.  Lindley  &  Singpurwalla  (2002)  or  Spizzichino  (2001)].  Then,  conditional 
on  pi,  our  subjective  probability  that  A,-  =  1  is  pi ,  i  =  1, . . . ,  n.  Unconditionally,  P(Xi  =  1)  = 
Jo  Piit(pi)dpi  =  E (pi),  where  jr(p,*)  encapsulates  our  uncertainty  about  the  propensity  i.e. 
it  (pi )  is  our  subjective  probability  of pt .  The  notions  of  propensity  and  subjective  probability  are 
articulated  in  de  Finetti’s  theorem  on  exchangeable  Bernoulli  sequences;  see  Lindley  &  Phillips 
(1976). 

Much  of  the  literature  on  the  reliability  of  binary  coherent  systems  is  conditional  on  pim  An 
exception  is  Lynn  et  aL  (1998),  in  which  the  analysis  is  based  on  averaging  out  p\, ...  ,pn  with 
respect  to  a  joint  distribution. 

Conditional  on  p  =  (px, . . .  ypn ),  the  reliability  of  the  system  is  a  function  of  p,  say  h(  p),  but 
only  if  then’s  are  (conditionally)  independent;  i.e.  (1)  given  p  =  (pupi, . . . ,/?„),  Xt  and  Xj  are 
independent,  V  i  ^  j,  and  (2)  given pi,Xi  is  independent  of/?y,  V/  f  i.  Consequently,  P(<f>(X)  = 
1  |p)  =  ^(X)|p)  =  A(p). 

Analogues  of  the  pivotal  decomposition  and  Theorems  1  and  2  follow,  asserting  that  the 
reliability  of  any  binary  coherent  system  is  bounded  below  (above)  by  that  of  a  series  (parallel) 
system,  if  the  Xf s  are  conditionally  (given  p)  independent,  and  redundancy  at  the  component 
level  is  superior  to  redundancy  at  the  system  level  when  the  systems  are  connected  in  parallel, 
vice  versa  if  in  scries;  see  Barlow  &  Proschan  (1975). 

1 . 4  Vaguen  ess  or  Imprecision 

For  purposes  of  discussion,  consider  a  generic  element  of  S  =  [0,  I],  say  x.  At  any  point,  we 
may  be  able  to  inspect  the  system  and  declare  that  ^r(X)  =  x.  If  we  are  able  to  place  this  x  in  a 
well-defined  subset  of  5,  then  we  say  that  the  states  of  the  system  can  be  classified  with  precision. 
There  are  scenarios,  however,  where  the  identification  of  a  state  can  be  done  unambiguously,  but 
the  classification  cannot;  this  is  the  case  of  classification  with  ' vagueness '. 

In  the  context  of  coherent  systems,  vagueness  is  not  synonymous  with  uncertainty  of 
performance.  Uncertainty  of  performance  is  lack  of  knowledge  about  the  future  state  of  the 
system,  e.g.  will  the  system  be  functioning  5  hours  from  now?  Vagueness  pertains  to  uncertainty 
about  classification,  i.e.  an  inability  to  place  any  outcome*  in  a  subset  of S  because  the  boundaries 
of  the  subset  cannot  be  sharply  delineated.  Some  examples  illustrate  this  point. 

Suppose  that  S  =  {0, 1, ... ,  10},  with  each  element  representing  a  state  in  which  the  system 
can  exist,  ranging  from  the  ideal  at  10,  to  the  undesirable  at  0.  Then  what  is  the  subset  of  ‘good 
states’  in  S ?  This  subset  is  not  well  defined;  for  example,  is  7  a  good  state?  If  S  were  to  be 
partitioned  into  ‘good’  and  ‘bad’  states,  such  partitioning  being  a  feature  of  natural  language 
(Zadch,  1965),  would  5  qualify  as  a  good  state  or  a  bad  state?  More  likely,  5  qualifies  as  both 
a  good  state  and  a  bad  state.  Thus  if  ^(X)  =  5,  then  the  state  of  the  system  is  simultaneously 
good  and  bad.  As  another  scenario,  consider  an  automobile  that  has  3000  miles  on  it.  Should 
this  automobile  be  classified  as  a  ‘new’  or  a  ‘used5  car?  The  question  of  classification  arises  in 
the  contexts  of  setting  insurance  rates,  taxation  and  warranties.  The  subset  of  miles  that  go  into 
classifying  a  car  as  being  ‘new’  is  not  sharply  defined;  it  is  imprecise .  Most  cars  sold  as  being 
new  have  anywhere  from  20  to  100  miles — perhaps  even  more — on  them.  In  actual  practice, 
decisions  arc  often  made  on  the  basis  of  vague  knowledge  that  is  relevant,  e.g.  decisions  about 
health  care,  maintenance  and  replacements  (see  Section  6).  As  another  illustration,  medical 
treatments  are  based  on  classification  of  'high  blood  pressure’  or  ‘bad  cholesterol,5  and  such 
classifications  fluctuate  due  to  the  subjectivity  of  interpretation  between  ‘good5  and  ‘bad’.  The 
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philosopher  Black  (1939)  gives  examples  from  other  sciences.  Of  historical  note  is  the  famous 
example  of  Schrodinger’s  Cat  [cf.  Pagels  (1982),  p.  125]  from  quantum  physics.  Schrodinger’s 
thought  experiment  pertains  to  a  cat  in  a  sealed  radioactive  box  in  outer  space  which,  according  to 
one  school  of  thought,  is  simultaneously  alive  and  dead.  Examples  from  the  statistical  sciences 
wherein  vague  knowledge  is  relevant  are  most  likely  to  arise  from  the  behavioral  and  social 
contexts,  such  as  inferences  based  on  political  polling,  and  medical  decisions  based  on  a  quality 
of  life  questionnaire  (Cox  et  al .,  1992),  wherein  responses  almost  always  tend  to  be  vague. 

The  existing  theory  of  both  binary  and  multistate  coherent  systems  with  precise  classification 
as  its  underlying  premise  is  unable  to  deal  with  the  types  of  scenarios  mentioned  above.  Some 
other  concerns  have  been  voiced  by  Marshall  (1 994).  One  idea,  namely  to  classify  states  by  more 
than  one  criterion,  precedes  ours  and  we  applaud  him  for  this  foresight;  it  makes  a  case  for 
the  viewpoint  espoused  here. 

1.5  Overview  of  Paper 

In  Section  2,  we  give  a  synopsis  of  many-valued  logic  to  include  its  connectives  of  negation, 
conjunction,  disjunction,  implication,  and  equivalence.  In  Section  3,  we  extend  the  material  of 
Section  1.2  to  the  case  of  multistate  systems;  i.e.  for  those  components  and  systems  where  S 
consists  of  more  than  two  elements.  Here,  we  invoke  Lukasiewicz’s  ( 1 930)  many-valued  logic  to 
define  the  structure  function  of  multistate  systems,  and  arrive  upon  results  that  are  in  agreement 
with  those  currently  available.  The  material  of  Section  3  serves  two  purposes.  One,  it  shows  how 
many-valued  logic  provides  a  common  platform  via  which  the  material  on  multistatc  systems 
can  be  seen.  Second,  it  sets  the  stage  for  developing  the  material  of  Sections  4  and  5,  which  is 
entirely  new.  A  use  of  many -valued  logic  is  unlike  that  used  by  Baxter  (1984),  El-Neweihi  et  al. 
(1978)  and  Griffith  (1980),  whose  development  centres  around  binary  logic. 

Sections  4  and  5  pertain  to  the  scenarios  wherein  the  classification  of  component  and  system 
states  is  vague.  In  both  sections,  S  consists  of  two  vague  subsets,  and  these  serve  as  an  analogue 
to  binary  state  systems  with  precise  classification.  A  key  tool  here  is  the  ‘consistency  profile’ 
introduced  by  Black  (1939).  Zadeh’s  (1965)  ‘membership  function5  parallels  the  notion  of  a 
consistency  profile.  The  harnessing  of  Lukasiewicz’s  many -valued  logic  with  Black’s  consistency 
profile  provides  a  vehicle  for  the  treatment  of  vague  coherent  systems.  To  do  so,  however,  we 
need  to  lean  on  aspects  of  statistical  inference  and  the  statistical  treatment  of  expert  testimonies. 

Section.  6  relates  the  material  of  Sections  4  and  5  to  decision  making  in  maintenance 
management  using  natural  language.  Section  7  concludes  the  paper. 

2  Many-valued  Logic:  An  Overview 

Binary  logic,  upon  whose  foundation  the  theory  of  coherent  structures  has  been  developed, 
pertains  to  propositions  that  adhere  to  the  ‘Law  of  Bivalcncc’  (or  the  ‘Law  of  the  Excluded 
Middle5):  all  propositions  are  either  true  or  false.  Lukasiewicz  (1930)  recognized  the  exis¬ 
tence  of  propositions  that  can  be  both  true  and  false  simultaneously,  and  thus  modified  the 
calculus  of  binary  propositions  to  develop  a  calculus  of  three-valued  propositions.  Alternatives 
exist  to  Lukasiewicz’s  three-valued  logic;  however,  for  us,  Lukasiewicz’s  proposal  is  most 
appealing. 

It  is  important  to  distinguish  between  the  calculus  of  probability  and  the  calculus  of 
three- valued  logic.  Probability  pertains  to  the  quantification  of  uncertainty  about  events  (or 
propositions)  that  adhere  to  the  Law  of  Bivalence.  Thus  we  have,  as  a  part  of  the  calculus  of 
probability,  the  axiom  of  additivity.  On  the  other  hand,  the  calculus  of  many- valued  logic  is  based 
on  a  rejectidn  of  the  Law  of  Bivalence.  The  two  arc  therefore  different  constructs. 
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Table  1 

(a)  Truth  Table  for  Lukasiewicz  s  Y  A  Z.  (b)  Truth  Table  for  Lukasiewicz's  Y  V  Z. 


YaZ 

Values  of 
Proposition  Z 

YvZ 

Values  of 
Proposition  Z 

0 

1/2 

1 

0 

1/2 

1 

Values  of 

0 

0 

0 

0 

Values  of 

0 

0 

1/2 

1 

Proposition  Y 

1/2 

0 

1/2 

1/2 

Proposition  Y 

1/2 

1/2 

1/2 

1 

1 

0 

1/2 

1 

1 

1 

1 

1 

Consider  two  propositions  Y  and  Z,  each  taking  one  of  three  values:  0,  ~  and  1 .  The  negation 
of  Y  is  Yf  =  1  —  Y,  as  proposed  by  Lukasiewicz  (1930).  When  the  proposition  Y  takes  the  value 
1  (0)  in  a  truth  table,  it  signals  the  fact  that  the  proposition  is  true  (false)  with  certainty.  Values 
of  Y  intermediate  to  1  and  0  signal  an  uncertainty  about  the  truth  or  the  falsity  of  Y.  The  value 
|  is  chosen  arbitrarily  for  convenience;  any  value  between  0  and  1  could  have  been  chosen.  The 
other  logical  connectives  in  the  three-valued  logic  of  Lukasiewicz  arc  conjunction,  disjunction, 
implication  and  equivalence,  denoted  (Y  a  Z),  (Y  v  Z),  (Y  ->  Z)  and  (Y  ==  Z),  respectively. 
The  truth  tables  for  the  first  two  are  given  in  Table  1,  and  we  refer  the  interested  reader  to 
Malmowski  (1993)  for  further  details.  Generalizations  from  the  three-valued  to  the  many- valued 
case  to  incorporate  propositions  that  are  true  or  false  with  various  degrees  of  uncertainty  are 
straightforward. 

3  Invoking  Many- Valued  Logic  for  Multistate  Systems 
3.1  Introduction 

The  aim  of  this  section  is  to  generalize  the  case  of  binary  systems  with  precise  classification 
to  systems  that  can  exist  in  multiple  (m  4-1  with  m  >  1)  states.  The  states  are  labeled  j  = 
0, 1,  2, . . . ,  m,  with  1  representing  a  perfect  state  andO,  the  state  of  total  collapse.  The  intermittent 
states  of  degradation  range  from  to  where  ^  is  the  state  which  is  penultimate  to  the  total 
failure  of  the  system.  Thus,  the  range  of  states  now  takes  the  form  S  =  {£;  jr  =  0,  1 , 2, . . . ,  m) 
and,  by  allowing  m  to  be  infinite,  we  are  able  to  consider  a  continuum  of  degraded  states,  in 
which  case,  «S  c  [0,  1].  With  cS  so  defined  for  both  the  components  and  the  system,  what  would 
be  the  meaningful  choices  for  the  structure  function  when  the  system  has  a  series,  parallel,  or 
&-out-of-«  architecture? 

In  the  past,  several  proposed  definitions  of  multistate  systems  have  been  made.  An  overview 
of  these  is  in  El-Neweihi  et  al .  (1978)  and  in  Baxter  (1984),  which  to  the  best  of  our  knowledge 
represents  the  latest  endeavors.  Considering  the  fact  that  these  papers  appeared  over  20  years 
ago,  one  may  sense  that  a  satisfactory  answer  to  the  above  question  is  available.  This  may  not  be 
true,  however,  because  all  the  proposed  approaches  reduce  to  a  representation  in  terms  of  binary 
states  and,  thus,  an  adherence  to  binary  logic.  As  an  example,  Baxter  (1984),  following  Barlow 
&  Proschan  (1975),  defines  the  structure  function  of  a  multistate  system  in  terms  of  the  system’s 
‘min-path’  and  ‘min-cut’  sets,  notions  which  can  have  an  interpretation  only  within  the  context 
of  binary  systems.  By  contrast,  our  proposal  here  is  to  use  Lukasiewicz’s  many- valued  logic  as 
a  basis  for  defining  the  structure  function  of  multistate  systems. 

Lukasiewicz’s  motivation  for  introducing  a  third  value,  namely  and  his  calculus  of  three¬ 
valued  logic  was  prompted  by  an  uncertainty  about  the  truth  or  the  falsity  of  a  proposition.  The 
number  \  did  not  reflect — in  any  sense — a  degree  of  uncertainty.  Whereas  Lukasiewicz  did  not 
appear  to  have  any  motivation  for  his  many- valued  logic  other  than  the  need  to  generalize,  the 
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degree  of  uncertainty  interpretation  provides  a  vehicle  for  extending  the  three-valued  logic.  With 
this  in  mind,  we  may  ask  whether  Lukasiewicz’s  calculus  can  be  directly  imported  to  the  scenario 
of  multistate  systems  when  the  degraded  states  can  be  specified  with  precision?  Our  examples 
of  Table  1  illustrating  the  three-valued  logic  suggest  that  this  can  be  done.  More  importantly,  our 
results  are  consistent  with  those  given  in  El-Neweihi  et  aL  (1978).  Consequently,  the  Lukasiewicz 
logic  can  be  seen  as  providing  a  rationale  for  the  existing  results  on  multistate  systems,  a  rationale 
that  has  been  missing. 


3.2  Definition  and  Structural  Properties 

Let  Xi  denote  the  state  of  component  /,/  =  1, . . . ,  n,  and  ^  =  ^(X)  the  state  of  the  multistate 
system  ;  X  =  (X\, . . .  ,  Xn).  The  Xj's  and  rJr(X)  take  values  in  S  =  {£,  j  =  0, 1, . . . ,  m). 

Definition  1:  (Griffith,  1980)  \[r  is  a  multistate  coherent  system  if 

1 .  \j/  is  non-decreasing  in  each  argument  of  X, 

2.  for  each  /  =  1,  2,  . . .  ,  n,  there  exist  states  0  <  a/  <  <  m  and  a  state  vector  X)  such 

that 

that  is,  each  component  is  relevant,  and 

3.  —  j-  where  i-  =  (^-,  . . . ,  A 

Properties  1  and  3  of  Definition  1  arc  consistent  with  those  ofBarlow&Wu  (1978),  El-Neweihi 
et  aL  (1978)  and  Natvig  (1982).  Property  2  generalizes  the  notion  of  relevance. 

To  use  the  logic  of  many-valued  propositions  for  multistate  systems,  it  is  necessary  to  order 
the  state  vector  X.  Since  each  X )  e  { j  =0,1,...,  m},  we  order  the  X^s  by  the  values  they 
take.  Specifically,  let  0  <  X^:n)  <  X&n)  <  •  •  <  X(i:n)  <  •  •  •  <  X(n:n )  <  1  denote  the  ordered 
vectors,  i.e.  X^)  is  the  weakest  of  all  the  n  components  and  X^  j^  the  strongest.  Consequently, 
from  Table  1(a),  the  structure  function  of  a  series  system  is  fs  =  min;  Xj  =  X{\:n );  that  is, 
the  performance  of  a  multistatc  scries  system  is  no  better  than  the  performance  of  its  weakest 
component.  If  n  —  2,  and  if  each  Xt-  can  take  only  three  values  {0,  ~,  1}  with  |  denoting  the 
degraded  state,  then  Table  1(a)  with  Y  A  Z  replaced  by  t^x(X)  and  Y  (Z)  replaced  by  X\{X{) 
gives  us  a  table  for  the  states  of  the  system,  given  the  states  of  the  components.  Figure  1(a) 
displays  the  state  of  <ps(X)  =  4>s(X i,X2)  when X\  and2G>  take  binary  values,  0  and  1.  In  contrast, 
Figure  1(b)  shows  the  behaviour  of  ^^(X)  when  X\  and  X2  axe  allowed  to  take  all  values  in  the 
unit  interval,  showing  the  effect  of  continuously  degrading  components  on  the  structure  function. 
Clearly,  v^sCX)  provides  more  granularity  than  <£s(X). 

For  a  parallel  redundant  system,  iJfp(X)  =  max, Ay  =  X(n-ny,  see  Table  1(b).  This  suggests  that 
the  performance  of  a  multistate  parallel  system  is  no  worse  than  the  performance  of  its  strongest 
component.  In  the  three-valued  case,  the  entries  of  Table  1(b)  provide  us  with  a  table  for  the 
states  of  the  system  given  the  states  of  the  components,  when  n  =  2.  The  state  of  0p(X)  when 
X\  and  X2  take  binary  values,  0  and  1 ,  is  displayed  in  Figure  2(a).  In  contrast,  Figure  2(b)  shows 
the  behaviour  of  ^p(X)  when  X\  and  X2  take  all  values  in  [0,1].  Again,  yJ/p(X)  provides  more 
granularity  than  <t>p(X). 

For  multistate  &-out-of-n  systems,  we  define  ^k(X)  =  X(n-k+\-ji)\  this  definition  ensures 
consistency  among  systems,  i.e.  H-out-of-Ai  systems  are  denoted  ^(X)  and  1-out-of-rc  systems 
arc  denoted  \J/p(X).  Interestingly,  our  set-up  and  definition  of  a  multistate  coherent  system 
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Figure  1.  (a)  Two-component  binary  system ,  <£s(X).  (b)  Two-component  system,  with  continuously  degrading 

components.  The  coordinates  are  labeled  (X\ ,  Xj ,  <£,s(X))  and  (X\ ,  ^2,  ^/s(X)),  respectively . 


Figure  2,  fo)  Two-component  binary  system,  <pp(X).  (b)  Two-component  system,  ^ />(X),  >W//i  continuously  degrading 
components.  The  coordinates  are  labeled  (X\1X2,  <pp(X.))  and  ( X\ ,  X2*  $ />(X)),  respectively. 


permits  the  definition  of  a  dual  of  a  binary  coherent  system  to  hold.  The  dual  of  a  A>out-of-n 
system  is  :m)(X)j  an  (n  —  k+  l)-out-of-H  system. 

In  Lemma  1,  the  pivotal  decomposition  for  binary  structure  functions  is  generalized  for  (m  + 
1)  precise  categories  through  consideration  of  their  associated  indicator  variables. 

Lemma  1 :  The  following  identity  holds  for  every  n-component  multistate  structure  function  \(f 
with  precise  classification:  f{X)  —  /  =  1,  • . . , «  where  = 

1(0)  if X,  =  ±(jf,  #  i).  . 

Proof.  Any  multistate  structure  function,  i/r(X)  can  be  decomposed  into  a  representation  that 
considers  the  i-th  component  separately  from  the  remaining  (n  —  1)  components.  In  particular 
for  the  multistate  component,  takes  only  one  value  from  {0,  •  •  ■ ,  1}.  The  result 

follows. 

Theorems  1  and  2  of  Section  1  can  be  generalized  for  multistate  coherent  systems.  To  do  so, 
we  introduce  the  following  additional  notation.  For  X  =  {X\, . . .  ,Xn}  and  Y  =  {Y\,  . . . ,  Y„}, 
X  <  Y  if  Xj  <  Yi  for  each  /  =  1, . . . ,  n.  As  a  generalization  of  Theorem  1,  we  have: 
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Theorem  3:  Let  \jf  be  a  multistate  coherent  system  of  order  n;  i.e.  f  has  n  components .  Then 

X{\  n)  <  lAPO 

Theorem  4:  Let  ^  be  a  multistate  coherent  system  of  order  n.  Then 


VKXvY)>  *(X)v*(Y), 

(3) 

f(XAY)<  f(X)  a  v'z(Y). 

(4) 

T  he  equality  in  (3)  and  (4)  hold  for  all  X  and  Y  if  and  only  if  the  system’s  architecture  is  parallel 
and  series,  respectively. 

Thus,  for  a  multistate  coherent  system,  equation  (3)  reiterates  the  result  that,  structurally, 
component-level  redundancy  is  superior  to  system  level  redundancy,  and  vice  versa  in  equa¬ 
tion  (4).  Theorems  3  and  4  and  Lemma  1  are  also  in  El-Neweihi  &  Prosehan  (1984).  They  are 
stated  here  for  completeness. 

Since  X^-n)  =  ^(X)  and  X(n:n)  =  \j/p(X),  we  have  the  result  that  the  structure  function  of 
any  multistate  coherent  structure  is  bounded  by  the  structure  functions  of  multistate  scries  and 
parallel  systems. 


3.3  Multistate  System  Reliability  under  Precise  Classification 

Suppose  that  the  component  state  vectors  X\,  ...  >  Xn  are  (conditionally)  independent  and 
identically  distributed  with  P(Xj  =  L  |  pj+l)  =  pj+u  for  /  =  1 , . . . ,  n  and j  =  0, . . . ,  m,  where 
Pj+]  >  0  and  £J=0 /ty+t  =  1*  That  is,  each  Xt  has  a  multinomial  distribution  over  {£;/  = 
0,  1,  2, . . , ,  m]  with  parameter  pj+\9  j  =  0, . . . ,  m.  Let  p  =  (/5j,  . . . ,  /5m+i).  Clearly  for  each 
y,  P(^/(X)  =  ^)  depends  on  p  alone,  since  the  Xfs  are  assumed  to  be  conditionally  (given  p) 
independent.  Thus,  wc  let  P(iJ/(X)  =  L  |  p)  =  hj( p),  where  h}  is  some  function  of  p  Suppose 
that  the  architecture  of  ^  is  a  (n  —  fc  +  l)-out-of-A2  system.  Then 


Example  1:  Let  m,n  =  2.  Therefore,  we  consider  a  two-component  system  with  three  possible 
states:  total  failure  (0),  degradation  (|),  and  perfect  functioning  (1 ),  with  associated  probabilities 
pi,  p2,  and  ^3,  respectively.  Then,  the  probability  that  the  parallel  system  is  totally  failed  is 
ft0(p)  =  PtyP(X)  =  0  |  p)  =  p\.  i.e.  the  parallel  system  is  totally  failed  when  all  its  components 
are  totally  failed.  The  probability  that  a  series  system  totally  fails  is  /io(p)  =  P(^sQQ  =  0  | 
p)  =  2p\p2  +  2p]pi  +  p\\  thus,  a  senes  system  fails  completely  when  at  least  one  component 
is  totally  failed. 

WhenJq,. . .  >Xn  arc  independent  but  not  identically  distributed,  we  may  generalize  the  above 
properties  by  introducing  P(Xi  =  L  |  pt  )  =  pij+l ,  j  —  0, . . . ,  m  where  for  each  i,  pij+}  >  0 
and  piJ+l  =  1 .  We  define  pj  =  (pil , . . . ,  /?Im+l)  to  be  the  reliability  vector  associated  with 
the  z'-th  component  and  p  =  (pi,  . . .  ,  pn).  Given  the  conditional  independence  of  the  Xfs,  a 
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(n  —  k  +  1  )-out-of-«  system  has 


A+l(X)=  — 

m 


y(p)  =  -P  (t* 

=E(nEA,)(n  e  «.) 

a  \ieJab=  1  /  \/eJE 6=y+2  / 


m+1 


-El  n  e«.  n  Ea. 

a  \  »£(./-))„  a=i  /  \<e(y-iyaA=y+i  / 


where  Ja  is  the  subset  of  (1, 2, . . .  ,n)  where  at  least  k  components  are  performing  within  level 
and  J'a  is  the  complement  of  Ja.  Similarly,  (J  —  l)a  is  the  subset  of  (1, 2, . . . ,  n)  where  at  least 
k  components  function  within  level  and  (J  —  \)'a ' a  is  the  complement  of  ( J  —  l)a. 

Lemma  2  provides  the  pivotal  decomposition  for  the  reliability  function,  hj{ p). 


lemma  2:  The  following  identity  holds  for  the  pivotal  decomposition  of  hj( p): 

m 

a= 0 

where  hj[(±)u  p]  =  P(^(X)  =  i  I  X,  =  £,  p). 

Proof.  Follows  from  the  Law  of  Total  Probability. 


hA p) = J2 


■  .  for  j  =  0, . . m;i  =  1, . . . ,  n. 


(5) 


4  Components  with  Imprecise  State  Glassification 

Binary  state  systems  with  precise  classification  were  overviewed  in  Section  1.2,  and  the 
concept  of  vagueness  introduced  in  Section  1.4.  Sections  4  and  5  serve  to  combine  these  two 
notions  to  develop  a  mechanism  for  the  treatment  of  vague  coherent  systems,  with  Section  4 
devoted  to  the  case  of  components  in  vague  states,  and  Section  5  to  the  case  of  coherent  systems 
in  vague  states. 

The  terms  ‘coherence’  and  ‘vagueness’  may  seem  contradictory;  however,  they  do  not  pertain 
to  the  same  object.  The  first  is  associated  with  the  truth  values  of  logical  connectives,  whereas 
the  second  pertains  to  the  partitioning  of  a  set  into  subsets.  We  start  with  some  background  on 
vagueness  and  then  discuss  approaches  for  quantifying  it. 

4.1  Vagueness:  General  Background 

Vagueness  has  been  discussed  by  philosophers  like  Bertrand  Russell,  and  by  physicists  like 
Albert  Einstein.  To  Russell  (1923),  ‘all  language  is  more  or  less  vague’  so  that  the  Law  of  the 
Excluded  Middle  ‘is  true  when  precise  symbols  are  employed  but  it  is  not  true  when  symbols 
are  vague,  as,  in  fact,  all  symbols  are.’  Black  (1939)  recognized  the  inability  of  binary  logic 
to  satisfactorily  represent  propositions  that  are  neither  perfectly  true  nor  false.  He  attempted  to 
rectify  this  by  analyzing  the  concept  of  vagueness  in  order  to  establish  an  ‘appropriate  symbolism’ 
by  which  binary  logic  can  be  viewed  as  a  special  case.  Unlike  Lukasiewicz  (1930),  who  was 
also  concerned  about  the  Law  of  the  Excluded  Middle,  Black  did  not  introduce  three-valued 
propositions.  Rather,  he  defined  a  vague  proposition  as  one  where  the  possible  states  of  the 
proposition  are  not  clearly  defined  with  respect  to  inclusion,  and  introduced  the  mechanism  of 
‘consistency  profiles’  as  a  way  of  treating  vagueness.  Black’s  consistency  profile  is  a  graphical 
portrayal  of  the  degree  of  membership  of  some  proposition  in  a  set  of  imprecisely  defined 
states,  with  1  representing  absolute  membership  in  a  state  and  0  an  absolute  lack  of  membership. 
Precise  propositions  are  treated  via  step  functions  as  consistency  profiles,  and  vague  propositions 
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Figure  3.  Example  of  Consistency  Profiles :  (a)  for  a  precise  set.  (b)  for  a  vague  set.  The  consistency  profile  is  0  after  x* . 
Table  2 


Membership  table  for  precise  set ,  A  \ ,  versus  fuzzy  set ,  A  2 


X 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

0 

0 

0 

0 

0 

0 

1 

1 

1 

1 

0 

0 

0 

0 

0 

0.2 

0.5 

0.9 

1 

1 

by  consistency  profiles  that  tend  gradually  from  one  extreme  to  another;  see  Figure  3.  The 
scaling  between  0  and  1  is  arbitrary;  other  convenient  limits  could  have  been  used.  Further,  the 
consistency  profile  which  is  specified  by  an  individual,  or  a  group  of  individuals,  need  not  be 
unique. 

4.2  Membership  Functions  and  Probabilities  of  Fuzzy  Sets 

Black’s  (1939)  consistency  profile  precedes  Zadeh’s  (1965)  membership  function.  For  each  x, 
a  normalized  membership  function  0  <  /i^(x)  <  1  describes  a  belief  of  containment  of  x  in  a  set 
A.  When  =  1  or  0,  A  is  a  crisp  (or  precise)  set;  when  0  <  ixA(x)  <  1,  A  is  a  fuzzy  set.  To 
illustrate  the  concept  of  a  fuzzy  set,  consider 

Example  2:  Let  A\  =  {x  €  {1, 2, ... ,  10}  |  x  >  7}.  For  any  specified  x,  there  is  no  ambiguity  as 
to  whether  x  belongs  to  A\  or  not.  By  definition,  fiAl(x)  =  1  when  x  —  7,  8,  9,  or  10;  otherwise, 
it  is  zero  (see  Table  2).  Thus  A  \  is  a  precise  set,  since  /^,(x)  =  1  or  0.  By  contrast,  consider  the 
set  A2  =  {x  e  (1,2,...,  10}  |  xis  large}.  The  term  Marge’  is  vague;  thus,  we  cannot  precisely 
ascertain  the  containment  of  any  x  in  A^.  A  possible  membership  function  for  Ai,  pAfx),  IS 
given  in  Table  2;  this  assignment  is  not  unique. 

For  fuzzy  sets,  A  and  D  in  a  basic  set  M,  with  membership  functions  fiA(x)  and  /x^(x) 
respectively,  Zadeh  (1965)  defined  set  operations  that  parallel  those  of  precise  sets.  For  any 
x  in  a  given  basic  set  M, 

1 .  fiAuB(x)  =  max[nA(x), 

2.  fiAnB{x)  =  min[fiA(x),tiB(x)l 

3.  fiA'(x)  —  1  -  fxA(x), 

4.  A  C.B  <$■  fxA(x)  <  fiB(x),  and 

5.  A=B<$  fxA(x)  —  nB(x). 

Thus,  the  union  of  fuzzy  sets  A  and  B  is  the  fuzzy  setzl  U  5,  whose  membership  function  is  max 
[pA(x),  /^^(x)];  similarly  for  the  intersection  and  the  complement.  There  is  a  parallel  between 
operations  with  fuzzy  sets  and  the  conjunction  and  disjunction  connectives  of  Lukasiewicz 
(1930).  In  Section  5.1,  we  use  these  operations  to  define  structure  functions  of  vague  binary 
state  systems.  Thus,  we  claim  that  Lukasiewicz’s  logic  provides  a  unifying  framework  via  which 
both  multistate  as  well  as  vague  systems  can  be  studied. 
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4.2. 1  Probabilities  of  fuzzy  sets 


In  the  context  of  this  paper,  statistical  inference  plays  a  key  role.  This  role  comes  into  effect 
when  we  endow  a  probability  measure  for  a  fuzzy  set,  say  A.  There  are  two  key  ideas  that  drive 
this  development,  namely  that  (1)  vague  sets  are  a  consequence  of  one’s  uncertainty  about  the 
boundaries  of  sharp  sets,  and  (2)  the  membership  function  ixa(x)  is  to  be  interpreted  as  data  (or 
information)  whose  role  is  to  help  induce  a  likelihood  function,  just  like  the  role  of  an  observation 
in  traditional  statistical  inference.  The  above  ideas  can  be  best  exposited  by  envisioning  the 
scenario  of  expert  testimonies  and  information  integration  that  has  gained  current  popularity  in 
statistical  practice  (cf.  Reese  et  al.  2004). 

Accordingly,  we  consider  the  actions  of  D,  an  assessor  of  probabilities  (or  a  decision  maker), 
who  quantifies  his  (her)  uncertainty  about  any  outcome  ofX,  sayx,  being  classified  in  A  viaaprior 
probability  n  d(x  e  A ).  The  thesis  here  is  that  all  uncertainties,  including  those  of  classification, 
be  quantified  via  probability.  In  order  to  sharpen  the  prior  probability,  D  consults  an  expert, 
say  Z,  and  elicits  from  Z  a  membership  function  iiA(x).  This  fxA(x)  can  be  seen  as  additional 
information  about  the  nature  ofx’s  membership  in  A,  and  de  facto  serves  a  role  analogous  to  that 
of  observed  data  in  statistical  inference  about  outcomes.  In  essence,  observed  data  are  evidence 
about  outcomes  whereas  membership  functions  are  evidence  about  classification.  In  principle, 
D  may  consult  several  experts  and  elicit  from  each  membership  functions  as  a  way  to  further 
sharpen  the  analysis. 

With  fxA(x)  at  hand,  D  constructs  his  (her)  likelihood  function  that  x  e  A\  we  denote  this 
likelihood  by  C[x  e  A;  fxA(x)].  The  construction  of  this  likelihood  follows  standard  statistical 
procedures  for  formally  incorporating  expert  testimonies,  and  should  include  things  such  as 
D\ s  view  of  the  expertise  of  Z  and,  in  the  case  of  several  experts,  correlations  between  them 
(cf.  Lindley,  1991;  Clarotti  &  Lindley,  1988).  Since  C[x  €  A;tiA(x)]  is  D) s  likelihood  that  Z 
declares  fiA(x)  whenx  e  A,  the  specification  of  this  likelihood  is  a  subjective  exercise  on  the  part 
of  D.  Conventionally,  in  statistical  inference,  likelihoods  for  unknown  parameters  are  prescribed 
via  probability  models  (for  outcomes)  using  the  observed  data  as  fixed  quantities.  By  contrast, 
what  we  have  done  here  is  prescribed  a  likelihood  about  classification  using  the  membership  as 
a  fixed  entity,  but  without  the  benefit  of  a  probability  model.  In  so  doing,  we  have  interpreted 
the  likelihood  in  a  broader  sense,  namely  as  a  weighting  function  (Basu,  1975).  In  addition  to 
£[x  e  A;  fiA{x)  1,  &  also  needs  to  specify  C[x  A;  /xA(x)],  which  is  D’ s  likelihood  that  x  £  A 
when  Z  declares  a  /xA(x),  and  /fr>(x)  which  is  D9 s  subjective  probability  that  an  outcome  x  will 
occur.  Thus  D  needs  to  specify  two  probability  measures  tc  d(x)  and  tt £>(x  e  A ),  one  for  outcomes 
and  one  for  classification,  and  two  likelihoods,  C[x  e  A;  tiA(.x)]  and  C[x  <£  A;  fxA(x)]. 

With  the  above  in  place,  Duses  standard  statistical  methodology  involving  Bayes’  Law,  Bayes’ 
Factors,  and  prior  to  posterior  odds  (cf.  Kass,  1993)  to  obtain  a  probability  measure  for  a  fuzzy 
set  A  (cf.  Singpurwalla  &  Booker,  2004)  as 


PD[XeA;ixA(x)]  =  J2 

X 


’j  £[*  j  Au*a(x)] 
CeA;fxA(x) 


nD(x  i  A) 
nD(x  e  A) 


Pd(x). 


(6) 


Equation  (6)  above  is  the  essence  of  the  material  of  this  section;  it  is  to  play  a  key  role  in 
what  is  to  follow.  In  obtaining  the  above,  wc  have  leaned  heavily  on  the  statistical  notion  of 
likelihood  and  the  likelihood  ratio.  Equation  (6)  simplifies  if  D  chooses  to  use  Z’s  declared 
fiA(x)  as  the  sole  basis  for  constructing  his  (her)  likelihood,  so  that  C[x  e  A\  fiA(x)]  =  fiA(x), 
and  C[x  <£  A\iia(x)]  =  1  —  fxA(x).  In  this  case, 


PD[X  €  A;ha(x)]  = 


7tp(x  $  A) 
ttd(x  e  A) 


Pd(x). 


(7) 


International  Statistical  Review  (2008),  76, 2,  247-267 

©  2008  Th«  Authors.  Journal  compilation  ©  2008  International  Statistical  Institute 


258 


K.F.  Sellers  &  N.D.  Singpurwalla 


4.2.2  The  role  of precise  and  fuzzy  data  in  vague  systems 

In  equations  (6)  and  (7),  Pr>(x)  encapsulates  D\ s  prior  uncertainty  about  an  outcome  jc.  Were 
D  to  have  at  his  (her)  disposal  x  =  (jtj , . . . ,  xn),  data  on  X ,  then  (x)  would  get  replaced  by  a 
posterior  probability,  say  Pr>{*\  x).  The  calculation  of  this  posterior  would  be  a  routine  exercise 
were  D  to  invoke  a  probability  model  for  outcomes,  and  were  the  actual  observations  x\, . . .  ,xn 
sharp  (i.e.  precisely  stated).  What  must  D  do  to  update  Pd(x)  if  the  data  x  is  itself  fuzzy? 

To  address  this  question,  we  first  need  to  clarify  as  to  what  one  means  by  fuzzy  data ,  a  term 
that  has  appeared  in  several  book  and  article  titles;  see,  for  example  Bcrtoluzza  et  al.  (2002),  and 
Viertl  (2006).  If  by  fuzzy  data,  we  mean  imprecision  of  observation  (i.e.  observation  error),  then 
the  treatment  of  such  data  can  be  routinely  handled  via  standard  statistical  technology,  provided 
that  an  error  distribution  can  be  specified.  The  literature  on  ‘calibration’  adequately  deals  with 
this  issue;  see,  for  example,  Huang  (2002).  If  by  fuzzy  data,  we  mean  a  statement  such  as  ‘the 
outcome  does  or  does  not  belong  to  the  fuzzy  set  A\  then  the  incorporation  of  such  information 
for  updating  Pd{x)  is  no  more  a  standard  matter.  In  other  words,  when  the  actual  value  taken  by 
X ,  say  Xj,  is  not  declared,  but  what  is  declared  is  whether  the  actual  value  belongs  or  not  to  A , 
an  assessment  of  Pd(*\  observed  value  belongs  (does  not  belong)  to  A)  poses  a  challenge.  This 
can  be  addressed  if  a  likelihood  for  X—  jc,*  with  the  knowledge  that  the  ‘observed  value  belongs 
(does  not  belong)  to  A  ’  can  be  specified  by  D.  The  specification  of  such  a  likelihood  will  entail 
several  issues  vSuch  as  who  provides  D  the  said  knowledge,  Z  or  someone  other  than  Z.  If  it  is 
Z,  then  pA(: c)  provides  some  guidance  to  D  about  specifying  the  likelihood.  If  it  is  someone 
other  than  Z,  then  D  needs  to  contemplate  the  knowledge  provider’s  actions.  These  and  other 
issues  remain  to  be  addressed,  including  the  matter  of  calibrating  Z  and  updating  membership 
functions. 

4.3  Components  in  Vague  Binary  States 

The  notion  that  units  can  exist  in  states  that  are  vaguely  defined  was  introduced  in 
Section  1.4.  Specifically,  let  X  denote  the  state  of  a  component  at  some  time  r  >  0,  and  let 
X  take  values  in  S  =  (jc;  0  <  x  <  1},  with  one  representing  the  perfectly  functioning  state. 
Consider  Q  c  <S,  where  Q  —  {jc;jc  is  a  ‘desirable’  state}.  Suppose  that  interest  centres  around 
X  e  Q.  Suppose  also  that  we  are  unable  to  specify  an  x*  such  that  X  >  x*  implies  that  X  €  Q 
and,  otherwise,  X  £  Q.  Thus,  the  boundary  of  Q  is  not  sharp;  i.e.  Q  is  a  fuzzy  set.  Let  pg(x) 
be  the  membership  function  of  Q.  Figure  4  illustrates  plausible  forms  for  pg(x).  Interest  may 
centre  around  Q  for  several  reasons,  a  relevant  one  being  a  desire  to  use  ‘natural  language’  for 
communication  with  others  on  matters  such  as  repair  and  replacement.  Another  possibility  is 
that  it  may  not  be  possible  to  observe  the  actual  value  of  jc,  but  one  may  be  able  to  make  a  general 
statement  about  the  state  of  the  component. 

The  complement  of  Q ,  say  Qc ,  is  that  fuzzy  set  whose  membership  function  is  1  —  pg(x).  It 
is  important  to  note  that,  if  another  subset  B  C  S  was  defined  as  B  =  {jc;  jc  is  an  ‘undesirable’ 
state},  then  Qc  may  or  may  not  be  B  unless  p&(; c),  the  membership  function  of  B ,  was  such  that 
p$(x)  =  1  —  fig( jc).  In  principle,  one  is  free  to  choose  a  ps(x)  that  need  not  bear  a  relationship 
to  fxg(x).  For  example,  in  Figure  4(a),  pb(x)  is  symmetric  to  pg(x)9  whereas  in  Figure  4(b), 
/Xfl(x)  and  pg(x)  are  not  symmetric.  There  is  precedent  in  the  statistical  sciences  for  choosing 
asymmetric  likelihood  functions.  For  example,  one  need  not  specify  likelihood  functions  that 
are  symmetrical  for  competing  hypotheses. 

Example  3:  An  assessor  D  wants  to  assess  the  probability  that  a  component  will  be  in  a 
‘desirable’  state  Q  at  some  future  time  r.  That  is,  D  wishes  to  specify  Pd[X  €  Q\  fXg(x)]y  where 
a  membership  function  of  the  form  pg(x)  =  jc4,  0  <  jc  <  1  has  been  elicited  by  D  from  an  expert, 
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(b) 


Figure  4.  Membership  functions  of  Q  and  B:  (a)  Symmetric  case,  (b)  Asymmetric  case. 


Pd(x) 


Figure  5.  Component  state  at  time  zyP[>(x). 


Figure  6.  Two  possible  prior  forms  of  classifying  X  =  x,  P\{x  e  Q)  and  /^(jc  e  Q),  supplied  by  the  assessor  D. 


Z.  Suppose  that  jPo(x),  D’s  personal  probability  that  the  state  oftlie  eomponentat  time  r  will  bex 
is  of  the  form  given  in  Figure  5;  it  is  a  Beta(6,2)  density.  Furthermore,  suppose  that  D’s  belief  that 
nature  will  classify  any  x  in  Q,  namely  PD(x  €  G),  is  of  the  general  form  illustrated  in  Figure  6  with 
the  label,  P\  (X  £  Q).  Then,  it  can  be  seen — via  equation  (7) — that  Pp\X  e  Q;  —  0.6605. 

As  a  consequence,  Pd[X  ^  G'^gi*)]  =  1  —  0.6605  =  0.3395.  By  contrast,  suppose  now  that, 
if  D  were  to  specify  P&(x  e  G)  via  the  label  P2(x  €  G)  of  Figure  6  and  keep  everything  else  the 
same;  then  Pp>[X  e  G;  fdg(xy]  would  increase  to  0.7486.  Thus,  even  a  small  change  in  the  form 
of  Pd{*  €  G)  produces  a  noticeable  ehange  in  D\ s  final  answer. 


4.4  Reliability  of  Components  in  Vague  Binary  States 

We  say  that  a  component’s  state  is  ‘vague  and  binary’  if  interest  centres  around  a  single  vague 
set  of  the  kind  Q or  B  in  our  illustrations.  As  was  mentioned  before,  we  should  bear  in  mind  that, 
in  general,  Qc  need  not  be  B  and  vice  versa,  unless  of  course  Q  and  B  are  precise  sets.  For  Q  = 
{. x ;  x  is  a  ‘desirable’  state}  and  fig(x)  specified,  it  is  reasonable  to  define  the  reliability  of  the 
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component  as  P&[X  e  G\  fxg(x)].  Equation  (6)  can  now  he  used  to  evaluate  this  probability.  With 
B  =  {x;  x  is  an  ‘undesirable’  state},  and  jxq(x)  specified,  we  may  define  the  unreliability  of  the 
component  as  Pq  [X  e  B\  fis(x)] .  We  could  have  also  defined  the  unreliability  of  the  component 
as  Pd[X  €  Gc fig (x)],  where  Gc  is  that  fuzzy  set  whose  membership  function  equals  1  —  ftg(x). 
With  either  choice  for  the  definition  of  unreliability,  we  see  that,  when  a  component’s  state  is 
vague  and  binary,  its  unreliability  is  not  necessarily  the  complement  of  its  reliability !  This  result 
is  in  contrast  to  that  of  binary  coherent  systems. 

Example  4:  The  case  of  components  that  can  exist  in  precise  binary  states  can  be  en¬ 
compassed  within  the  above  framework;  fig(x)  =  1  for  x  >  x*  and  tvd(x  6  Q)  =  1  if  jc  > 
x*,  and  zero  otherwise.  Furthermore,  B  =  Gc,  thus  Pq[X  e  Glixg(x)]  =  1  —  Fd(x*)  and 
Pd[X  €  fXg(x)\  =  Pd[%  $  G\  Pg(x)\  =  Fo(x*)y  where  Fd(x*)  is  the  cumulative  distribution 
function  (cdf)  associated  with pn(x)  evaluated  atx*. 

5  Binary  State  Systems  with  Imprecise  Classification 

The  purpose  of  this  section  is  to  extend  the  development  of  Section  4.3  on  binary  state 
components  with  imprecise  classification  to  the  ease  of  binary  state,  ^-component  systems  with 
imprecise  classification.  By  ‘binary  state  systems  with  imprecise  classification’,  we  mean  those 
systems  whose  component  states  arc  vague  and  binary,  and  whose  structure  functions  satisfy 
the  logical  connectives  of  Lukasiewicz;  see  Section  2.  Our  motivation  for  choosing  this  as  a 
definition  of  structure  functions  is  that  the  structure  functions  of  binary  state  coherent  systems 
with  precise  classification  are  exactly  the  membership  functions  of  certain  precise  sets.  The 
case  of  multistate  systems  with  imprecise  classification,  though  not  discussed  here,  follows  by 
analogy. 

5. 1  Structure  Functions  as  Membership  Functions  of  Precise  Sets 

Let  .Y,  be  the  state  of  component  i  taking  a  particular  value  x/,  i  =  1,  . . .  ,  n.  Suppose  that 
each Xi  can  take  values  in  S  —  {x;  0  <  x  <  1).  Let  Gi  =  {x,-;  x*  is  a  ‘desirable’  state},  Qi  c  S. 
Let  fJLgfxi)  denote  the  membership  function  of  Qiyi  —  1, . . . ,  n.  For  now,  suppose  that  Gi  is 
precise  for  all  /.  That  is,  for  each  i,  there  exists  an  x *  such  that  )  —  1(0)  when  X;  >  x* 
(x,  <  x*).  For  ease  of  notation,  this  section  focuses  solely  on  the  subspace  Gi ;  therefore,  we  use 
to  denote  the  representation  of  the  above  membership  functions,  witli  the  understanding 
that  the  membership  function  assigned  is  dependent  on  the  fuzzy  classification,  Qiy  which  itself 
depends  on  component  i.  For  the  remainder  of  this  paper,  we  let  C[X  £  Gil  ^(x)]  —  1  —  ji,  (x) 
and  C[X  £  G<p(X)l  M0(X)(^)]  =  1  ~  tx<p(X)(x),  where  is  as  defined  in  Section  1 .2. 

Let  X  =  (X\9  . . .  ,  Xn)  and  suppose  that  the  n  components  are  in  series.  Thus  the  system  s 
structure  function  is  1  if  and  only  if  X/  >  x*  for  all  /  =  1,  . . . ,  n.  However,  x,*  >  xt  implies  that 
iXjfc)  =  1  for  each  /.  Thus  we  may  write 


n 


where  is  the  membership  function  of  the  intersection  of  the  n  precise  sets  Qiy  i  = 

1 Thus,  the  structure  function  of  a  series  system  with  precise  classification  can  also  be 
interpreted  as  the  membership  function  of  the  intersection  of  n  precise  sets.  Similarly,  if  the 
n  components  were  to  be  connected  in  parallel  redundancy,  then  the  structure  function  of  the 
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system  would  be 


MX)  =  [J  =  max[/*,(X/)]  =  M(*:„)(x)> 


(9) 


which  is  the  membership  function  of  the  union  of  Qi,  i  =  1, . . . ,  n.  Finally,  for  a  k-out-of-n 
system,  we  could  write 


<MX)  = 


if  E7-i 

otherwise. 


(10) 


Whereas  the  relationships  of  equations  (8)  and  (9)  have  an  interpretation  within  the  calculus  of 
fuzzy  sets,  equation  (10)  does  not.  Sums  of  membership  functions  are  not  a  part  of  the  calculus 
of  fuzzy  sets.  We  therefore  seek  an  alternate  way  of  expressing  <£*(X).  We  do  this  as  follows. 

Suppose  that  the  fJLt(Xi)  terms  are  relabeled  so  that  /X(1:„)(X)  is  the  minimum  and  /Z(„ .„)(X)  is 
the  maximum;  i.e.  M(i:*)(X)  <  M(2:*)(X)  <  <  H(n-k+ i:«)(X)  <  •  •  •  <  M(n:„)(X).  Since  each 

is  either  zero  or  one,  the  above  ordering  will  result  in  equalities  for  many  of  the  above 
terms.  Once  the  above  is  done,  we  see  that  0x(X)  =  M(/j-a+i:/i)(X).  Thus,  in  general,  the  structure 
function  of  a  £-out-of-/i  system  is  the  membership  function  of  the  precise  set  intersecting  the  k 
smallest  Q\  sets. 


5.2  Structure  Functions  of  Vague  Binary  State  Systems 


Motivated  by  the  material  of  the  previous  section,  we  define  the  structure  function  of  series, 
parallel,  and  k-o\xt-of-n  systems  whose  component  states  are  vague  and  binary  as 


4>s(X)  =  min[^i(Z;)]  =  /x(1  „)(X), 


4>p(X)  =  max[/it(Xi)]  =  At(*:„)(X),  and 
/ 

<Pk(X)  =  H(n-k+l  :;i)(X)- 


These  structure  functions  are  identical  to  those  for  the  case  of  binary  precise  sets,  except  that 
now,  is  a  membership  function  of  an  associated  vague  set  ,  i  =  1 . 

Finally,  if  n D{xi  €  Qi)  denotes  7J)’s  probability  that  a  particular  x,  gets  classified  in  Qi,  then  by 
analogy  with  equation  (7),  we  have 


Pd[X,  €  ^;/r,fx,)] 


1 


Mxi) 


) 


rr/)(x;  j  Qt) 
nD{Xi  6  Qi) 


d  Pd(x,). 


(11) 


where  Pd(xi)  is  D's  probability  that  X(  <  X/. 

Our  development  thus  far  has  assumed  that  the  membership  functions  (x,),  i  =  1,  . . .  ,  n, 
are  all  distinct.  Simplification  occurs  if  /2{(x/)  =  fi(x)  for  i  =  1,  . . .  ,  n.  We  limit  our  attention 
to  the  ease  of  series  and  parallel  systems  because  more  complicated  systems,  such  as  networks 
can  be  represented  as  a  combination  of  series-parallel  systems. 


5.3  Reliability  of  Vague  Binary  State  Systems 

If  the  state  of  each  component  in  a  system  is  a  desirable  state,  will  the  system  itself  be  in 
a  desirable  state?  The  answer  to  this  question  need  not  be  in  the  affirmative.  This  is  because 
requirements  on  the  system  could  be  more  stringent  than  those  on  each  component  of  the  system. 
This  is  unlike  the  case  of  binary  state  systems  with  precise  classification  wherein  a  series  system 
is  judged  to  be  reliable  if  all  its  components  arc  reliable.  Thus,  there  are  two  possible  ways  in 
which  the  reliability  of  a  vague  coherent  system  can  be  defined.  The  first  is  to  assume  that  a 
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series  system  is  reliable  if  all  its  components  are  in  a  desirable  state.  The  second  is  to  require  that 
for  a  system  to  be  judged  reliable,  its  state — say  x — be  a  desirable  state.  Specifically,  we  require 
that  x  6  G#(X)>  where  G^pq  —  {x;x  is  a  ‘desirable’  system  state}  and  Gppq  C  S.  Associated  with 
G<p(x)  is  its  membership  function,  tig^{x).  Similarly,  in  the  case  of  a  parallel  system,  we  have 
two  possibilities  for  defining  reliability — the  first  one  being  that  the  system  is  reliable  if  at  least 
one  of  its  components  is  in  a  desirable  state,  and  the  second  being  the  requirement  that  its  state 
x  e  Gtp(x) •  Wc  simplify  notation  by  letting  ^pq(x)  =  /igw(x)  and  focusing  the  discussion  on 
the  subspace  <?(.). 

For  assessing  reliability,  let  us  consider  the  first  case  for  series  and  parallel  systems.  Assuming 
the  Xi ’s  independent,  the  reliability  of  a  series  system  would  be  ]~I"=I  [ -Pd  [Xi  e  Gi ;  /x,-  (x,)]]  where 
Pd  [Xi  e  Giltofa)]  is  given  by  equation  (11).  The  reliability  of  a  parallel  redundant  system 
is  PoiUU  e  Gi}',  IXt(xi),  i  =  1, it  can  be  evaluated  by  the  Inclusion-Exclusion 
formula  of  probability  (Feller,  1968).  The  computations  simplify  when  the  Aj’s  are  assumed 
identically  distributed.  The  case  of  A>out-of-n  systems  follows  along  similar  lines. 

With  regard  to  the  above,  a  question  arises  as  to  what  we  mean  by  independence  of  the  X,  ’s, 
when  the  Xt ’s  take  values  in  a  vague  set.  In  the  context  of  equation  (1  1 ),  Xt  and  Xj,  i  ^  j,  will  be 
judged  independent  if 


Pn(X,  <  xh  Xj  <  Xj)  =  PoiXi  <  x,)  •  Pd(Xj  <  xy),  and  if 
Poixi  €  Gi,Xj  6  Gj)  =  PD(xi  6  Gi)  ■  PoiXj  e  Gj)  and 


Po(xi  i  Gi,Xj  i  Gj)  =  Pd(xi  i  Gi)  ■  Pd(xj  i  Gj). 


The  more  interesting  case  is  the  second  one,  wherein  a  system  is  reliable  if  the  state  in  which 
it  resides  is  a  desirable  one.  We  start  with  the  case  of  a  series  system  with  structure  function 
<ps(X).  Its  reliability  is  PD(<f>sQ(.)  e  G<t>s(xy,  /^sPoC*)]  which,  from  equation  (1 1),  is  of  the  form 


Pd(<PsQC)  6  G<fis(x)',  Mfo[X)(x)]  =  J 


1 


A^(X)(x) 


7Tp(x  j  G<p,{X))  ' 

71  d(x  C  <5fo(X))_ 


dPD(x), 


(12) 

where  nD{x  e  G<^S(X))  is  D’s  probability  thatx  is  classified  in  G^sOO  were  4>s(X)  ~  x,  and  Pd(x) 
is  D’s  probability  that  0s(X)  <  x. 

Since  <ps(X)  =  min ,  lXi(Xi)  =  /X(i  „)(X),  we  obtain  Pp(x)  as  follows: 


PottsV)  >x)  =  Pd(IH»(X)  >  *) 

=  PD(M^i)  >  X,  i  =  n) 

=  -Pp(Z,  >  /x-'(x),  /  =  1, . . . ,  n), 

=  Y\  Pd[% \  H7\x)l  if  Xi's  are  assumed  independent, 

i=) 

where  denotes  the  inverse  of  Subsequently,  dPo(x)  can  be  obtained.  If  the  Xj’s 

cannot  be  judged  independent  with  respect  to  D’s  distribution  for  the  A^’s,  wc  need  to  specify 
a  joint  distribution  for  these,  such  as  Marshall  &  Olkin’s  (1967)  multivariate  exponential,  or 
any  of  its  variants,  hi  the  case  of  parallel  systems,  the  development  will  proceed  along  similar 
lines,  save  that  now  Pd(x)  will  be  obtained  via  fl/=i  Pd[Xi  <  p~x  (x)].  Finally,  the  case  of 
(n  —  k  +  l)-out-of-n  would  follow  by  considering  the  distribution  of  the  k-  th  order  membership 
function,  „)(*). 


Example  5:  Consider  a  two-component  series  system  where  the  component  performances  are 
independent  and  identically  distributed.  D  wishes  to  assess  Pd[<Ps(X)  e  G<ps(x)\  fi4,s(x)(x)]-  The 
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first  option  is  to  compute  the  product  of  the  component  probabilities.  Let  /Xpt  (x)  =  x2,  and  Pd(x) 
and  PD(x  e  Gi )  be  as  shown  in  Figures  5  and  6,  respectively,  for  i  =  1,2.  Then,  ^[^^(X)  e 
G^s(X)l  A^s(X)(*)]  =  0.6232.  The  second  option  is  to  compute  the  system  reliability  directly, 
through  the  use  of  Z’s  membership  function  for  the  entire  system.  Supposing  that  the  expert 
holds  a  stronger  standard  for  the  system  to  be  in  a  desirable  state  than  that  for  the  compo¬ 
nents,  we  let  p 4>s(x)(x )  =  x10.  Meanwhile,  D  considers  Pd(x)  and  PD(x  e  Gi)  as  specified  in 
Figures  5  and  6  for  <£s(X),  implying  that  PD[<j>s(X)  e  GfsQQl  A^s(X)O0]  =  0.4321.  Thus,  by 
holding  the  system  to  a  more  stringent  standard,  D’ s  assessment  of  the  system  reliability  is  lower 
when  considered  directly,  as  opposed  to  that  when  using  a  more  relaxed  membership  function 
to  represent  belief  at  the  component  level. 

6  Maintenance  Management  in  a  Vague  Environment 

Examples  3-5  illustrate  how  D  is  able  to  assess  the  probability  that  the  state  of  a  unit  will 
be  in  a  ‘desirable’  state,  or  its  complement.  Why  would  D  be  interested  in  such  a  probability 
instead  of  the  probability  that  the  state  of  the  unit  will  be  x,  0  <  x  <  1?  Reasons  were  given  in 
Section  4.3,  the  one  pertaining  to  communication  using  ‘natural  language’  being  the  most 
relevant.  This  point  is  best  underscored  via  the  scenario  of  maintenance  wherein  one  must  decide 
whether  to  repair,  replace,  or  simply  continue  to  monitor  the  unit.  In  practice,  judgments  about 
maintenance  are  not  based  on  assessments  of  uncertainty  about  x;  they  are  based  on  conjectures 
about  whether  or  not  the  unit  will  be  in  a  ‘desirable’  state. 

Consider  the  following:  a  unit  is  required  to  perform  service  for  some  time  period.  The  unit 
can  exist  in  one  of  three  states:  G  (for  good),  B  (forbad),  and  A  (for  acceptable).  When  the  unit  is 
in  state  G ,  the  utility  to  D  provided  by  the  unit  is  U(G)\  analogously,  we  define  U(A)  and  U{B).  It 
is  reasonable  to  suppose  that  U{A)  <  U(G)  and,  in  principle,  —U(B)  could  be  greater  thanZY((7), 
i.e.  the  cost  for  being  in  state  B  could  dominate  the  reward  for  being  in  state  G -  With  the  above 
in  place,  D’ s  problem  is  to  make  a  decision  whether  to  replace  the  unit,  denoted  7Z,  or  to  repair 
the  unit,  denoted  Aiy  or  do  nothing,  denoted  ff.  There  is  a  cost  associated  with  each  of  these 
three  actions,  and  these  arc  denoted  —  U(7Z),  —U{ Ai),  and  — U(Af ),  respectively.  Presumably, 
— U{Jf)  <  —  U{M)  <  —  U(JZ).  Which  of  the  above  three  actions  should  D  take? 

The  problem  is  solved  by  using  maximization  of  expected  utility  (MEU)  [cf.  Lindley  (1991), 
p.  58].  The  decision  tree  of  Figure  7  facilitates  an  implementation  of  this  recipe;  the  rectangle 
represents  D’s  decision  node  and  the  three  circles  denoted  R\,  R2 ,  and  R3  represent  the  three 
nodes  corresponding  to  the  three  actions  TZ ,  M  and  Afy  respectively.  Each  (random)  node  results 
in  one  of  three  outcomes,  ★  =  Gy  A  or  B,  and  these  are  portrayed  in  Figure  7  only  for  the  node 
At  the  terminus  of  the  tree  are  the  utilities.  For  example,  U(Jfy  G)  denotes  the  utility  to  D , 
when  D’s  decision  is  to  monitor  the  unit  and  the  outcome  is  G> 

The  MEU  principle  requires  that,  at  each  random  node,  D  compute  an  expected  utility  of 
an  action  that  leads  to  that  node.  For  this,  D  needs  to  assess  the  probabilities  that  at  r,  the 
state  of  the  unit  will  be  in  G,  A ,  and  B ,  respectively.  These  probabilities  would  depend  on  three 
ingredients:  membership  functions  of  the  kind  £t*(x),  pa(x)>  and  Pb{x)\  D’s  prior  probability 
that  an  x  is  classified  (by  nature)  in  G ,  A,  and  B  (i.e.  PD(x  €*).*  =  G ,  A ,  B ),  and  Po(x] ),  D’s 
subjective  probability  that  the  state  of  the  unit  will  be  x.  Since  b  ^d(x  €  ★)  —  1 ,  D  need 

only  specify  any  two  probabilities.  Once  these  are  at  hand,  D  invokes  equation  (7)  to  obtain 
the  required  probabilities.  All  of  the  above  is  straightforward  except  that  Pn(x)  depends  on  the 
action  that  D  takes.  Both  repair  and  replacement  actions  tend  to  right-skew  the  form  of  Pd(x) 
toward  one.  Thus,  with  respect  to  the  illustration  of  Figure  5,  a  repair  action  will  tend  to  shift  the 
probability  mass  closer  to  one,  and  moreso  with  replacement.  To  summarize,  the  impact  of  D’s 
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u{%g) 

U{%A) 

U{%<3) 


actions  on  DJ s  probabilities  of  the  state  of  the  unit  are  reflected  only  in  PD (x).  The  membership 
functions  and  the  classification  probabilities  are  unaffected.  To  denote  such  a  dependence,  we 
shall  replace  the  Po(x)  of  equation  (7)  by  PD(x;%),  and  Pd[X  e  *;  /r*(*)]  by  P&(X  e  *;  /x*(x),  •) 
for  •  =  1 Z,  AA  and  N;  *  =  Q,  A,  B. 

Whereas  the  development  in  Sections  4  and  5  pertained  to  the  binary  case  involving  two 
vague  sets  B  and  £,  our  example  here  involves  three  vague  sets  A,  B ,  and  Q ,  and  their  respective 
membership  functions,  /z#(x),  •  =  A,  B  and  Q.  Of  these,  only  JjLa(x)  warrants  comment  since 
the  general  nature  of  the  other  two  has  been  discussed  before;  see  Figures  4(a)  and  (b).  It  is 
reasonable  to  suppose  that  the  general  form  of  am(*)  is  either  bell-shaped  or  an  inverted  U. 

Finally,  a  question  arises  as  to  whether  Pb(x),  anc^  Pg(x)  can  take  any  arbitrary  form 

independent  of  each  other.  The  answer  to  this  question  is  in  the  negative  because  the  membership 
functions  go  to  determine  the  quantities  Pn[X  c  A;  Am(x)],  Pd[X  e  B;p,s(x)]  and  Po[X  e 
Q\  fig(x)],  and  these  must  sum  to  one.  Thus,  D  needs  to  ensure  coherence  of  the  membership 
functions  just  like  how  D  needs  to  ensure  a  coherence  of  the  classification  and  state  probabilities. 
Since  D  elicits  membership  functions  from  Z,  it  is  incumbent  on  D  to  ensure  that  membership 
functions  do  not  lead  to  results  that  violate  the  countable  additivity  axiom  of  probability.  This 
important  point  has  not  been  addressed  in  Singpurwalla  &  Booker  (2004). 

The  utilities  at  the  terminus  of  a  tree,  U(J\f ,  Q)t  A)  and  B)  are  straightforward  to 
write  out.  Thus,  for  example,  U(M,  Q)  =  U(N)  +  M(G),  which  is  the  sum  of  the  disutility  due  to 
monitoring  and  the  utility  of  the  unit  being  in  state  Q.  Similarly,  U( Af,  B)  =  +  U(B),  and 

U{ J\f,  A)  =  U(N)  +U{A).  With  this  in  place,  we  compute  the  expected  utility  at  each  node. 
Thus,  for  example,  U{N),  the  expected  utility  at  node  R3  \sU(N)  —  %a ,b  ★)  •  Pn(X  e 
Af),  where  PD(X  e  *;^(x),;\T)  is  the  right-hand  side  of  equation  (7)  with  Pd(x) 
replaced  by  PD(x;J\f);  similarly,  the  other  terms  of  U(N).  The  expected  utilities  at  nodes  R\ 
and  £3  are  analogously  computed  as  U(7Z)  and  U(M),  respectively,  mutatis  mutandis .  Once  the 
above  are  done,  Z)’s  maintenance  decision  is  to  choose  that  action  for  which  the  expected  utility  is 
a  maximum.  Thus,  for  example,  \fU(N)  >  U(R)  >  U(M ),  then  jD’s  decision  would  be  simply 
to  do  nothing. 

How  does  the  above  material  differ  from  that  which  is  currently  available  in  the  literature 
on  maintenance  planning?  The  current  literature  would  require  each  node  to  be  binary  and. 


International  Statistical  Review  (2008),  76, 2, 247  267 

©  2008  The  Authors.  Journal  compilation  ©  2008  Intern adonai  Statistical  Institute 


Many-valued  Logic  in  Multistate  and  Vague  Stochastic  Systems 


265 


to  compute  the  expected  utility  at  each  node,  all  we  need  is  the  probability  that  x  >  jc*.  This 
probability  can  be  had  once  D  specifies  Pd(x;  •),  •  =  7 Z,J\f  and  M.  By  contrast,  we  allow  an 
x  to  exist  in  three  vaguely  defined  sets,  and  allow  x  to  simultaneously  exist  in  more  than  one  of 
these.  The  advantage  is  flexibility  and  a  facility  to  entertain  an  analysis  that  facilitates  natural 
language  communication.  Further,  in  the  existing  literature,  uncertainties  are  assessed  about 
times  to  failure  via  probabilistic  failure  models,  and  failure  is  viewed  as  a  sharply  defined  event. 
Consequently,  the  analysis  is  forced  into  a  binary  framework.  By  contrast,  our  uncertainties  are 
focused  on  x  which  can  encapsulate  degradation  of  a  unit. 


7  Summary  and  Conclusions 

The  term  ‘complex  stochastic  systems’  is  well  entrenched  into  the  vocabulary  of  statisticians, 
though  it  generally  pertains  to  a  use  of  the  Markov  Chain  Monte  Carlo  method.  This  paper  takes 
a  broader  view  of  this  term  by  embedding  within  it  the  theory  of  vague  coherent  structures. 
This  theory,  which  is  generally  associated  with  work  in  applied  probability  and  reliability  is 
germane  to  statisticians,  especially  those  whose  focus  is  on  biostatistics,  genetics,  graphical 
models,  and  neural  nets.  With  that  in  mind,  we  have  devoted  Section  1  to  an  overview  of  the 
key  notions  and  ideas  of  binary  state  systems  whose  two  states  can  be  precisely  delineated. 
The  mathematics  which  drives  the  development  of  results  for  such  systems  is  binary  logic.  In 
Section  1,  we  also  set  the  stage  for  the  material  of  Sections  4  and  5  by  introducing  the  idea 
of  imprecise  or  vague  sets.  The  need  for  such  sets  has  been  acknowledged  by  physicists, 
philosophers,  and  logicians.  More  recently,  their  need  has  also  been  recognized  by  those  involved 
in  decision  making  and  natural  language  processing.  Section  2  is  devoted  to  multivalued 
logic  in  the  context  of  multivalued  propositions.  The  focus  here  is  on  the  connectives  of 
conjunction  and  disjunction;  these  connectives  can  be  used  to  define  the  structure  function 
of  multistate  systems,  a  topic  treated  in  Section  3.  In  Section  3,  it  is  assumed  that  the 
classification  of  states  is  precise.  This  topic  has  been  covered  before  via  the  literature  on 
multistate  reliability;  however,  what  is  new  here  is  the  departure  from  binary  logic  to  multivalued 
logic. 

Sections  4  and  5  impart  to  this  paper  a  feature  that  is  novel.  Specifically,  they  pertain  to  the 
development  of  reliability  for  components  and  systems  whose  state  space  is  vague.  In  actuality, 
vague  state  spaces  are  more  realistic  than  the  usual  zero-one  states,  which  are  an  idealization  In 
Sections  4  and  5,  we  also  show  that  the  usual  notions  of  reliability  do  not  always  hold  when  the 
state  space  is  vague.  For  example,  the  unreliability  of  a  unit  is  not  one  minus  its  reliability,  and 
that  there  is  more  than  one  way  to  define  system  reliability. 

There  is  another  aspect  of  this  paper  that  warrants  comment.  In  the  existing  theory  of  coherent 
structures  with  precise  classification,  statistical  principles  have  no  role  to  play.  All  that  is  needed  is 
the  calculus  of  probability.  By  contrast,  when  dealing  with  vague  systems,  membership  functions 
and  consistency  profiles  create  a  role  for  the  likelihood  function  and,  in  so  doing,  mandate  a 
consideration  of  the  principles  of  Bayesian  statistical  inference. 

The  illustrative  examples  of  Sections  4  and  5,  and  the  maintenance  management  architecture 
of  Section  6  should  give  the  reader  an  inkling  of  the  practical  import  of  the  material  here.  For 
example,  in  maintenance  and  replacement  actions  pertaining  to  decision  making  uncertainty,  the 
usual  strategy  is  to  assume  that  the  state  space  is  binary  functioning  and  failed.  In  actuality, 
functioning  can  occur  at  different  levels  whose  boundaries  cannot  be  sharply  delineated.  Thus, 
it  makes  more  sense  to  study  maintenance  and  replacement  when  the  state  space  is  vague  for,  in 
actuality,  this  is  how  such  decisions  are  made. 
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Resume 

L’etat  de  Tart  dans  la  theorie  de  structure  coherente  est  guide  par  deux  assertions  qui  sont  tous  deux  limitants  :  (1 ) 
toutes  les  unites  d’un  systeme  peuvent  exister  dans  un  de  deux  etats,  defaillant  ou  fonctionnant;  et  (2)  a  n’importe  quel 
moment,  chaque  unite  pcut  seulement  exister  dans  un  des  susdits  etats.  En  realite,  les  unites  peuvent  exister  dans  plus  de 
deux  etats  et  c’est  possible  qu’une  unite  puisse  simuhanement  exister  dans  plus  d’un  etat.  Cette  demiere  caracteristique 
est  une  consequence  de  l’opinion  qu’il  ne  soit  peut-etre  pas  possible  de  definir  avec  precision  les  sous-ensembles  d’un 
ensemble  d’etats;  on  appelle  de  tels  sous-ensembles  vagues.  La  premiere  restriction  a  ete  adressee  par  les  methodes 
appelecs  “systemes  multi- etats”;  pourtant,  ces  methodes  n’ont  pas  pris  avantage  des  mathematiques  sur  les  propositions 
multivalues  en  logique.  Ici,  nous  invoquons  ses  tables  de  vente  pour  definir  la  fonction  des  systemes  multi-etats  et 
exploiter  ensuite  nos  resultats  dans  le  contexte  d’ambigu'ite.  Une  contribution  cle  de  ce  papier  est  d’argumenter  que  la 
logique  de  plusieurs  values  est  une  plateforme  commune  pour  etudier  tant  les  systemes  multi-etats  que  les  systemes 
vagues,  mais  pour  faire  ceci,  il  est  necessaire  de  se  baser  sur  plusieurs  principes  d’inference  statistique. 
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Abstract 

Assessing  conditionals  based  on  any  specified  probability  model  is  straightforward  and  unique  when  the  conditioning 
event  is  in  the  subjunctive  mood;  that  is,  supposing  that  the  conditioning  event  were  to  occur.  The  matter  becomes 
problematic,  however,  when  the  conditioning  event  actually  does  occur  as  observed  data,  and  thus  becomes  a  reality.  We 
illustrate  this  point  by  considering  a  commonly  occurring  scenario  in  the  actuarial  sciences,  engineering  reliability,  survival 
analysis,  and  in  general,  any  type  of  an  activity  that  involves  filtering.  We  argue  that  there  could  be  more  than  one  way  to 
bet  on  residual  life.  Our  message  is  that  it  is  the  likelihood — not  Bayes’  Law — which  is  the  tail  that  wags  the  dog! 

This  paper  should  appeal  to  both  probabilists  and  statisticians  who  are  interested  in  foundational  issues.  It  has  been 
written  to  honour  Richard  Johnson  whose  Editorship  of  Statistics  and  Probability  letters  has  provided  a  platform  for 
dialogue  between  probabilists,  statisticians,  and  those  who  strive  to  be  both. 

©  2007  Elsevier  B.V.  All  rights  reserved. 
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1.  Introduction 

In  the  process  of  using  marker  data  to  assess  the  lifetime  of  an  item  experiencing  failure  due  to  ageing,  we 
were  confronted  by  a  dilemma  that  sneaked  upon  us  as  a  matter  of  course  (see  Singpurwalla,  2006a).  It  turns 
out  that  the  scenario  leading  to  the  dilemma  is  quite  common  and  can  arise  when  addressing  practical  issues  of 
conditioning  in  the  actuarial,  the  engineering,  and  the  biomedical  sciences.  Stripped  to  its  essentials,  the 
scenario  goes  as  follows. 

Suppose  that  an  item’s  lifetime  X  is  judged  to  have  a  distribution  function  <7(x)  =  P{X  ^ x ),  and  a  survival 
function  G(x)  I  -  <7(x)  =  P(X  >x).  We  suppose  that  lifetime  can  be  continuously  monitored  so  that  x^O. 
Were  this  item  supposed  to  survive  until  x,  its  residual  (or  remaining)  lifetime  will  be  X  —  x.  We  are  required  to 
make  statements  of  uncertainty  about  (X  —  x),  so  that  actuarial,  engineering,  or  medical  decisions  about  the 
item  can  be  made.  That  is,  we  are  required  to  specify  P(X  —  x>u\X >x),  for  all  w>0.  Our  interpretation  of 
probability  is  de  Finnetian  (see  de  Finetti,  1937),  in  the  sense  that  probability  reflects  one’s  disposition  to  a 
two-sided  bet.  Thus,  probability  assessments  can  be  seen  as  a  device  for  hedging  our  bets  on  the  item’s 
survival,  or  some  other  unknown  quantity  of  interest,  sueh  as  parameters  in  probability  models. 
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A  solution  to  the  problem  posed  is  elementary  and  unique,  given  a  distribution  function  G.  Specifically,  for 
anyw>0 


P(X  -  x>u \X >x)  =  P(X>x  +  u\X >x)  = 


P{X>x  +  u) 
P(X>x) 


G(x  +  u) 
G(x) 


0-1) 


Suppose  now,  that  instead  of  the  subjunctive,  “ were  the  item  to  survive  until  x”,  we  are  told  that  the  item 
actually  did  survive  to  x.  That  is,  the  event  (X>x)  is  no  more  an  uncertain  event;  (X  >x )  has  now  become 
observed  data.  What  then  would  our  assessment  of  the  uncertainty  about  the  residual  life  (X  —  x)  be?  In  other 
words,  how  would  we  bet  on  the  event  (X  —  x>  u),  for  w>0?  Would  it  continue  to  be  G(x  4  u)/G(x ),  or  could 
it  be  something  else?  If  the  latter,  would  the  number  to  bet  be  unique?  For  a  discussion  of  these  and  related 
questions,  one  may  visit  Freedman  and  Purves  (1969).  A  more  recent  discourse  on  the  different  kinds  of 
conditional  beliefs  is  in  Joyce  (1999,  Chapter  6). 

Intuitively,  it  seems  that  there  ought  to  be  some  distinction  between  looking  at  ( X>x )  as  a  possibility, 
versus  looking  at  it  as  a  fact  that  is  revealed  as  data.  Thus,  G(x  4-  u)/G(x)  need  not  be  the  correct  answer.  Yet 
many  individuals  when  faced  with  this  problem  would  simply  mimic  the  steps  leading  to  Eq.  (1 .1)  and  continue 
to  declare  G(x  4-  u)/ G(x)  as  their  answer.  In  doing  so  they  do  not  appear  to  be  making  a  distinction  between 
(X  >x)  as  a  supposition  versus  a  reality.  Alternatively  put,  they  may  be  failing  to  recognize  the  connotation 
that  in  a  conditional  probability  statement,  the  word  “given”  does  not  indicate  a  fact;  rather  it  indicates  a 
supposition  that  the  conditioning  event  is  true.  Thus,  are  those  who  declare  G(x  4*  u)/G(x)  as  their  answer — 
irrespective  of  the  character  of  the  conditioning  event — in  error,  or  is  there  a  rationale  for  their  answer? 

We  claim  that  the  rationale  cannot  completely  be  within  the  calculus  of  probability,  because  the  notion  of 
probability — at  least  from  a  subjectivistic  point  of  view— is  germane  only  when  the  disposition  of  all  events  in 
question  is  unknown.  Thus,  for  example,  it  may  not  make  sense  to  say  that  the  probability  that  a  coin  with 
heads  on  both  faces  when  flipped 'will  land  heads,  is  one.  This  is  because  the  disposition  of  the  outcome  is 
known  before  the  flip.  Consequently,  a  two-sided  bet  on  the  outcome  heads  has  to  be  $1,  which  will  be 
exchanged  for  a  $1  when  the  coin  lands  heads,  which  it  will.  The  two-sided  bet  of  $1  is  thus  meaningless.  The 
rationale  therefore  must  come  from  concepts  in  statistics  wherein  the  notion  of  a  likelihood  plays  a  signal  role. 
By  all  accounts  the  notion  of  a  likelihood  appears  to  be  alien  to  probability  theory. 

In  what  follows  wc  point  out  that  there  are  both  philosophical  and  technical  arguments  which  support 
G(x  4-  u)/G(x)  as  an  answer,  but  that  this  answer  is  one  among  other  possible  answers.  This  is  the  main  point 
of  this  article.  Arguments  about  conditioning  are  common  among  philosophers  of  science.  That  such 
arguments  could  also  be  relevant  to  reliability,  survival  analysis,  filtering,  and  forecasting  seems  to  not  have 
been  recognized. 


2.  Answer(s)  to  the  question 

2.1.  Reassessment  and  the  principle  of  conditionalization 

Some  individuals  when  faced  with  the  matter  of  assessing  P(X  —  x>u )  with  (X >x )  as  observed  data,  may 
chose  to  rc-assess  all  probabilities  treating  the  factual  event  (X>x)  as  a  part  of  background  history;  that  is, 
they  would  start  from  ground  zero,  even  if  the  observed  (X  >x)  is  not  a  surprise.  Diaconis  and  Zabell  (1982) 
label  a  process  like  this,  complete  reassessment ;  however,  the  driving  premise  considered  by  the  above  authors 
is  different  from  the  one  we  are  discussing  here,  in  the  sense  that  the  observed  event  is  considered  to  be  a 
surprise.  In  a  re-assessment  one  essentially  starts  all  over  again  from  scratch  and  possibly  even  rejects  G  as  the 
underlying  probability  model.  The  answer  that  one  obtains  may  therefore  not  necessarily  be  G(x  4  u)/G(x). 
Reassessment  is  a  perfectly  legitimate  step;  its  main  danger  is  the  risk  of  incoherence  (i.e.  a  lack  of 
consistency).  We  therefore  do  not  pursue  here  this  line  of  reasoning  and  do  not  advocate  reassessment  as  a 
strategy. 

To  ensure  coherence  one  may  proceed  formally  by  invoking  Bayes’  Law  as  an  inferential  mechanism,  using 
(X  >x)  as  data.  These  are  two  directions  from  which  this  can  be  approached,  one  general,  the  other  specific. 
These  wc  describe  in  Sections  2.2  and  2.3,  respectively,  wherein  we  point  out  that  there  need  not  be  a  unique 
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answer  to  the  question  posed,  and  that  under  a  certain  assumption,  G{x  +  u)/G(x)  will  indeed  be  one  of 
several  possible  answers. 

But  there  is  another,  more  philosophical,  argument  that  supports  G(x  +  u)/G(x)  as  a  correct  answer.  This 
argument,  known  as  the  Principle  of  conditionalization  (cf.  Howson  and  Urbach,  1989,  p.  68),  proceeds  as 
follows: 

Prior  to  observing  (X>x)  as  factual  data,  we  had  declared  that  G(x  4-  u)/G(x )  would  represent  our  bet  (or 
personal  probability)  on  the  event  (X  —  x>w),  for  some  u> 0,  were  the  event  (X>x)  turns  out  to  be  a  fact. 
Now  that  ( X>x )  has  revealed  itself  as  being  actually  true,  we  shall  act  as  we  had  declared,  and  thus  G(x  -f 
u)/G(x )  would  continue  to  be  our  bet.  As  suggested  by  a  reviewer,  another  way  to  articulate  the  principle  of 
conditionalization  is,  to  assert  that  “if  I  say  I  am  going  to  do  something,  I  will  do  it'’. 

Those  who  subscribe  to  a  complete  reassessment  by  starting  all  over  from  scratch,  may  reject  the  principle  of 
conditionalization  on  grounds  that  the  actual  occurrence  of  the  event  ( X  >x)  has  changed  their  psychological 
disposition  so  dramatically  from  their  disposition  under  the  supposition  that  ( X>x)y  that  they  can  no  more 
subscribe  to  G  as  their  model  of  uncertainty.  They  then  seek  an  alternate  to  G,  say  H  as  a  model  for  assessing 
(X  —  x).  This  point  was  made  by  Ramsey  (1931)  (cf.  Diaconis  and  Zabell,  1982)  who  stated  that 

[The  degree  of  belief  in  p  given  q]  is  not  the  same  as  the  degree  to  which  [a  subject]  would  believe  /?,  if  he 

believed  q  for  certain;  for  knowledge  of  q  might  for  psychological  reasons  profoundly  alter  his  whole  system 

of  beliefs. 

Diaconis  and  Zabell  (1982)  also  cite  other,  more  modern,  references  that  mention  the  above  issue;  these  are 
Hacking  (1967),  de  Finetti  (1972,  p.  150;  1975,  p.  203),  Teller  (1976),  and  Freedman  and  Purves  (1969). 

Additionally,  there  also  happens  to  be  empirical  evidence  from  quantum  mechanics  that  rejects  the 
conditionalization  principle  vis-a-vis  the  “double  slit  experiment”.  This  experiment  has  now  become  a  classic 
thought  experiment  for  its  clarity  in  expressing  the  central  puzzles  of  quantum  mechanics.  In  its  original 
version,  performed  by  the  English  scientist  Thomas  Young  sometime  around  1805,  the  experiment  consisted 
of  letting  light  diffract  through  two  slits  producing  fringes  on  a  screen.  The  goal  of  the  experiment  was  to 
resolve  the  question  as  to  whether  light  is  composed  of  particles  or  waves.  The  current  versions  of  the 
experiment  are  performed  with  electrons  instead  of  light  (cf.  Jonsson,  1974).  Such  experiments  have  shown 
that  the  probability  (as  assessed  via  the  relative  frequency)  of  some  event,  say  /?,  when  an  event  A  always 
occurs  is  not  equal  to  the  conditional  probability  of  B  given  A  found  from  an  experiment  in  which  A  occurs  in 
some  replications  and  the  complement  of  A  occurs  in  other  replications.  This  tantamounts  to  a  negation  of  the 
principle  of  conditionalization. 

2.2.  Using  Bayes '  Law ,  directly 

The  clearest,  and  perhaps  the  most  natural  way  to  address  the  question  posed  is  via  a  use  of  Bayes’  Law.  But 
to  better  articulate  the  workings  of  this  law  in  the  present  context,  we  introduce  the  convention  (see 
Singpurwalla,  2006b)  that  for  two  events  A  and  B ,  P{A\B)  denotes  the  conditioning  (or  supposition)  that  B  is 
true,  whereas  P(A\  B)  denotes  the  fact  that  B  is  actually  true.  With  the  above  convention  in  place,  our  problem 
boils  down  to  assessing  P{X>x  +  u,  X>x).  The  answer  is  given  by  Eq.  (2.2).  But  the  arguments  leading  to 
this  equation  entail  a  transition  from  purely  probabilistic  considerations  to  the  statistical  ones,  and  these  may 
be  helpful  to  re-iterate. 

To  assess  P(X  >x  +  u;  X>  x),  one  way  to  start  is  by  considering  the  proposition  P(X  >x  4-  u\X  >  x),  which 
by  Bayes’  Law  leads  us  to  the  inverse  relationship 

P(X >x  +  u\X  >x)  oc  P(X>x \X  >x  4-  u)P(X >x  4-  u),  (2.1) 

where  “a”  denotes  proportional  to.  Eq.  (2.1)  is  an  honest-to-goodness  probability  statement. 

However,  since  (X>x)  has  been  observed  as  data,  the  middle  term  of  Eq.  (2.1)  does  not  make  sense  as  a 
probability.  Instead,  it  is  the  likelihood  of  the  event  X >x  T  u  with  X>x  fixed.  We  denote  this  likelihood  by 
JZ?(X>x  +  u\X>x).  Similarly,  P(X>x  A  u\X>x)  must  now  be  written  as  P(X>x  4-  u;  X  >x).  In  writing 
Z£(X >x  -f  u\  X>x)  we  interpret  X>x-\-  u,  m>0,  as  a  hypothesis  and  X >x  as  data.  This  interpretation  is  not 
conventional  in  the  sense  that  in  statistical  inference  likelihoods  are  generally  functions  of  unknown 
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parameters,  not  unknown  events.  However,  as  stated  by  Edwards  (1992,  p.  12),  the  likelihood  can  be  regarded 
as  a  function  of  the  hypotheses  or  of  the  parameters.  A  treatment  of  the  question  posed  involving  the  use  of 
a  parametric  model  which  results  in  the  likelihood  being  a  function  of  the  parameter  will  be  discussed  in 
Section  2.3. 

With  the  above  in  place,  Fq.  (2.1)  now  becomes 

P(X>x  +  u\X>x)<x&(X>x  +  u;X>  x)P(X  >x  +  u).  (2.2) 

The  last  term  of  the  above  expression,  being  an  unknown  quantity,  is  G(x  +  u). 

According  to  Basu  (1975,  1982),  when  Fisher  (1912)  rediscovered  the  Gaussian  notion  of  likelihood,  he 
looked  upon  it  as  “a  scale  of  comparative  support  lent  by  the  data  to  various  possible  values  of  8  [an  unknown 
parameter]”;  also  see  Edwards  (1992,  p.  221).  This  interpretation  of  likelihood  is  (symmetrically)  different 
from  the  conventional  interpretation  in  which  the  likelihood  tells  us  which  hypothesis  better  supports  the  data 
(cf.  Edwards,  1992,  p.  9).  The  point  of  view  that  we  adopt  here  is  the  former.  Having  done  so,  we  are — in 
principle — free  to  choose  the  functional  form  of  the  likelihood  function  as  we  see  fit.  Suppose  then,  that  the 
likelihood  is  taken  to  be  a  constant,  say  1,  over  all  values  of  x  +  w,  with  *  fixed;  see  Fig.  E  Note  that  this  choice 
will  also  be  in  keeping  with  the  conventional  use  of  the  likelihood.  Then  Eq.  (2.2)  would  become 

P(X >x  +  u;X >x)  ex  1  •  P(X > x  +  u ), 

which  when  normalized  yields  P{X>x  +  u)/P(X>x)  =  G{x  +  u)/ G(a)  as  an  answer.  Thus,  implicit  to  the 
answer  given  by  those  who  subscribe  to  the  principle  of  conditionalization  (i.e.  those  who  mimic  the  steps  to 
assess  conditional  probability)  is  the  assumption  of  a  constant  likelihood! 

Since  one  is  free  to  choose  the  functional  form  of  the  likelihood,  what  if  the  likelihood  was  chosen  by  us,  see 
Fig.  1,  to  be  some  other  function  of  u,  say  exp(— u),  for  u>0?  Our  assessment  of  P{X>  x  +  u\  X>x)  would  be 
different;  namely,  it  would  be  cxp(—u)G(x'+  u)/G(x).  This  means  that  it  is  the  form  of  the  likelihood  that 
dictates  how  we  would  bet  on  residual  life.  The  standard  answer  G(x  +  u)/G(x)  arises  only  under  the  special 
case  of  a  constant  likelihood. 

The  constant  likelihood  encapsulates  a  user’s  disposition  of  indifference  with  respect  to  the  observed  X>x. 
A  decreasing  likelihood  one  of  conservatism.  The  form  of  likelihood  can  therefore  be  given  a  behaviouristic 
justification. 

2.3.  Using  Bayes 9  Law,  conventionally 

By  a  conventional  use  of  the  Bayes  Law  we  mean  the  introduction  of  a  parametric  model  into  the  analysis 
followed  by  a  prior  to  posterior  transformation  of  our  uncertainty  about  the  parameters.  When  we  do  so,  an 
argument  similar  to  the  one  of  Section  2.2  can  be  made,  and  possibly  with  more  transparency,  because  of  the 
concrete  nature  of  the  set-up.  Suppose  then,  that  P(A'<.x|0)  =  fr(xlfl),  where  8>0  is  some  unknown 
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Fig.  1.  Likelihood  of  event  (X>x  +  u)  with  (X>  x)  fixed 
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parameter.  Using  standard  arguments  involving  the  law  of  total  probability,  we  may  write 

P(X^x  +  u\X^x)  =  [  P(X^x  +  u\X^x,e)n(6\X^x)dQ, 

h 

where  by  Bayes’  Law 

n(9\X^x)  oc  P(X^x\6)n(0); 

n(G)  is  our  prior  distribution  of  0>O. 

With  the  event  (X^x)  as  data,  the  above  relationship  can  be  written  as 

P(X>x+u;X^x)  =  [  P(X>x+  u\9JX>x)n(9yX>x)d9  (2.3) 

Je 

with 


n(9;  X^x)oc  <£(9\  X  ^  x)n(9)\  (2.4) 

<P(9;X^x)  is  the  likelihood  of  0,  with  X  ^  jc  taken  to  be  fixed,  known,  and  also  assumed  to  be  credible. 

Were  we  to  subscribe  to  the  principle  of  conditionalization,  then  S£(9\X^x)  will  be  prescribed  by  our 
chosen  model  <j(jc|0).  If  otherwise,  we  are  free  to  choose  any  other  meaningful  form  for  SP(9\X  >x),  and  thus 
our  answers  to  P(X^: c  +  «;  X^x)  could  be  different.  The  example  below  illustrates  this  point 

Let  <j(;c|0)  =  1  —  exp(—  9x),  an  exponential  distribution  with  mean  1/0,  0> 0,  and  let  our  prior  on  0  be  a 
gamma  distribution  with  scale  (shape)  parameter  1  ( k ).  This  is  a  natural  conjugate  prior  for  0,  though  any 
other  prior  will  also  do.  Then 

poo 

P(X^x  +  u;X^x)  =  /  P(X^x  +  u\0;X^x)n(6;X>x)dO 
Jo 

Jr  oo 

'  e_u07r(0;  X  ^jc)d0, 

o 

and 

n{9;X^x)  oc  X(0;X&xyre&-l/r(k). 

When  i?(0;  X^x)  —  c~ex — which  is  what  the  principle  of  conditionality  would  mandate,  and  which  is  what 
is  conventionally  done— then  it  can  be  verified  that  the  posterior  distribution  of  0  is  also  a  gamma  with  scale 
[shape]  (x  +  1  )[k];  i.e. 

n(0;  X>x)  =  e  +  \)k/r(k). 

It  now  follows  that 


P(X  >  x  +  u\  X  >  x) 
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(2.5) 


As  an  aside  if  the  prior  on  0  were  taken  to  be  an  improper  prior ,  7r(0)=l,  0>O,  then 
P(X^x  4-  u\  X^x)  =  (x/(x-f-  u)).  This  assessment  of  residual  life  is  similar,  but  not  identical,  to  that  of 
Eq.  (2.5)  with  k  =  \. 

Suppose  now  that  one  were  to  not  subscribe  to  the  principle  of  conditionality  and  chose  ,S?(0;  X^x)  =  c;  i.e. 
the  likelihood  is  a  constant  c>0.  Then  the  posterior  of  0  would  equal  its  prior,  and  Eq.  (2.5)  would  become 
(w-l-l)-*.  Here  the  effect  of  x  vanishes,  because  in  choosing  a  flat  likelihood  one  essentially  says  that 
irrespective  of  what  x  is,  an  equal  weight  is  given  to  all  values  of  0.  Clearly,  this  choice  for  a  likelihood  is  not 
appealing.  However,  the  following  choice  for  Sf(0;  X^x)  appears  to  be  a  more  sensible  alternative. 

Suppose  that  instead  of  choosing  ^(9;X^x)  =  exp(— 0x) — a  decreasing  function  of  0 — one  were  to  choose 
i?(0;  X^x)  ss  exp (-0/Lc),  for  some  /?>0.  The  likelihood  would  still  be  a  decreasing  function  of  0,  but  the  rate 
of  decrease  would  vary,  depending  on  the  value  of  /?;  see  Fig.  2. 
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For  /?>0,  Eq.  (2.5)  would  become 


P{X^x  +  u;X^x)  — 


(  Px+i  \k 
\Px  +  i  y 


(2.6) 


so  that  the  introduction  of  a  /?  in  the  likelihood  tantamounts  to  assigning  a  weight  /?  to  the  observed  value  of  x. 
This  in  some  scenarios  could  be  a  desirable  feature  to  have,  say  when  the  accuracy  (i.e.  the  credibility)  of  the 
observed  x  is  suspect.  The  choice  /?>(<)!  would  inflate  (deflate)  x,  and  this  in  turn  would  cause  the  likelihood 
to  decay  faster  (slower)  than  the  conventional  exp(—#x)  Since  6  is  the  reciprocal  of  the  mean  time  to  failure, 
accentuating  large  values  of  0,  as  the  choice  p  <  1  would  tend  to  do,  boils  down  to  accentuating  small  values  of 
the  mean  time  to  failure  and  thence  small  values  of  the  residual  life.  Similarly  with  p>  1.  The  choice  P  =  1 
encapsulates  full  faith  in  the  observed  x  and  also  an  adherence  to  the  principle  of  conditionality.  Eqs.  (2.5)  and 
(2.6)  support  our  claim  that  the  introduction  of  a  parametric  model  increases  the  transparency  of  the  point  we 
are  trying  to  make. 


2.3.1.  Discussion:  the  advantage  of  parametric  models 

Parametric  models  are  used  because  they  facilitate  a  coherent  updating  of  the  assessed  uncertainties  via  a 
mechanistic  application  of  Bayes’  formula.  The  example  of  Section  2.3.2  underscores  this  point.  By  contrast, 
the  direct  approach  of  Section  2.2  requires  of  the  user  a  fresh  specification  of  the  likelihood  every  time  new 
evidence  becomes  available.  This  process,  besides  being  cumbersome,  has  the  danger  of  leading  one  to 
incoherence  should  one  not  be  thoughtful  about  one’s  specifications.  The  disadvantage  of  parametric  models 
is  that  the  chosen  model  may  not  be  an  accurate  reflection  of  reality.  All  the  same  the  computational 
advantage  offered  by  parametric  models  outweighs  the  disadvantage  of  misspecifieation,  and  thus  their 
common  use. 


2.3.2.  Application  to  survival  time  data  on  winding  life 

To  illustrate  the  workings  of  the  material  of  this  section  we  consider  here  some  service  life  data  on  “field 
windings”  of  generators  given  by  Nelson  (2000).  The  data  below,  abstracted  from  Nelson  (2000,  Table  1), 
consists  of  months  in  service  of  failed  and  unfailed  windings.  The  16  ranked  failures  and  survival  times — with 
the  former  tagged  by  an  asterisk — in  months  are 


31.7*,  39.2*,  57.5*,  65.0, 65.8*, 70.0*,  75.0, 75.0, 87.5, 88.3, 94.2, 101 .7, 105.8*,  1 09.2, 1 1 0.0*,  and  1 30.0. 


Observe  that  seven  out  of  the  16  field  windings  have  experienced  failures  and  of  the  nine  that  have  not  the 
largest  (smallest)  service  life  is  130  (65)  months.  Suppose  that  for  the  purposes  of  planning  for  maintenance, 
we  are  interested  in  the  probability  of  any  one  of  the  surviving  units  not  failing  for  an  additional  w>0  months. 
For  the  sake  of  discussion  let  us  pick  the  unit  with  the  largest  accumulated  life.  That  is,  we  need  to  assess 
P(X  >  1 30  +  u\  d),  where  d  denotes  the  life  history  data  given  above. 
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Assuming  that  P(X  ^x\ 0)  =  exp(— 9x),  with  a  gamma  prior  for  0  with  scale  (shape)  parameter  1  (k),  it  can  be 
verified  that  under  an  adherence  to  the  principle  of  conditionality,  the  posterior  distribution  of  9  is  also  a 
gamma  with  scale  jc,*  +  +  1)>  and  shape  k  +  ny  where  Yl'x  xt ls  sum  °f  the  m  survival  times  and 

ti  is  the  sum  of  the  n  failure  times.  When  such  is  the  case,  we  have — as  an  analogue  to  Eq.  (2.5)  the  result 
that  for  any  unfailed  unit,  that  has  experienced  a  service  life  of  x. 


P(X^x  +  u;  •)  = 


(  ST**  +  EP/+i  N 
\!C7  xi  +  u + u  +  v 


(2.7) 


Eq.  (2.7)  when  invoked — for  k  —  5 — in  the  context  of  the  surviving  unit  with  an  accumulated  service  life  of 
130  months  and  the  life  history  data  given  above  yields,  for  0, 


P(X  ^  130  +  w;  d)  = 


/  1306.9  \12 
\1306.9  +  u) 


(2.8) 


A  plot  of  P(X ^  130  -f  u;  d)  versus  w,  for  w^0,  is  shown  as  the  bold  faced  curve  of  Fig.  3. 

Were  the  principle  of  conditionality  not  adhered  to  and  the  likelihood  function  be  modulated  by  the 
constant  /?>0,  then  our  analogue  to  Eq.  (2.6)  would  be 


P(X^x  4-  w;  •) 


/  NEZxi  +  JZtd+i  \ 

\P(Ya  Xi  +  Si  o)  +  u  +  1  / 


(2.9) 


Eq.  (2.9)  when  invoked  in  the  context  of  the  scenario  leading  up  to  Eq.  (2.8)  for  ft  =  5  and  2  would  result  in  the 
dotted  curves  of  Fig.  3.  Our  assessed  survival  probability  depends  on  the  form  chosen  for  the  likelihood.  In 
principle,  likelihood  plays  a  more  crucial  role  than  the  prior,  because  whereas  the  prior  gets  updated  with  new 
evidence,  the  likelihood  stays  put  from  the  start. 


3.  Conclusion 

The  innocuously  simple  problem  of  assessing  conditional  probabilities  can  get  riddled  with  issues,  both 
philosophical  and  technical,  when  the  conditioning  event  becomes  a  reality.  The  cleanest  way  to  approach  it  is 
through  Bayes’  Law.  When  this  is  done  it  can  be  seen  that  the  standard  answer  arises  as  a  special  case  under 
the  assumption  of  a  constant  likelihood.  Other  forms  of  the  likelihood  will  lead  to  other  answers.  Since  the 
choice  of  a  likelihood  is  an  assessors  prerogative — -just  like  the  choice  of  a  probability  model — there  is  no 
unique  and  correct  way  to  bet  on  residual  life.  However,  the  traditional  answer  (presumably  the  one  that  will 
be  subscribed  to  by  card  carrying  probabilists )  will  be  the  correct  and  unique  answer,  but  only  when  its 
argument  is  sheltered  under  the  philosophical  (or  behaviouristic)  principle  of  conditionalization. 
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Abstract:  The  notion  of  quality  of  life  (QoL)  has  recently  received  a  high 
profile  in  the  biomedical,  the  bioeconomic,  and  the  biostatistical  literature.  This 
is  despite  the  fact  that  the  notion  lacks  a  formal  definition.  The  literature  on 
QoL  is  fragmented  and  diverse  because  each  of  its  constituents  emphasizes  its 
own  point  of  view.  Discussions  have  centered  around  ways  of  defining  QoL,  ways 
of  making  it  operational,  and  ways  of  making  it  relevant  to  medical  decision 
making.  An  integrated  picture  shovving  how  all  of  the  above  can  be  brought 
together  is  desirable.  The  purpose  of  this  chapter  is  to  propose  a  framework  that 
does  the  above.  This  we  do  via  a  Bayesian  hierarchical  model.  Our  framework 
includes  linkages  with  item  response  theory,  survival  analysis,  and  accelerated 
testing.  More  important,  it  paves  the  way  for  proposing  a  definition  of  QoL. 

This  is  an  expository  chapter.  Our  aim  is  to  provide  an  architecture  for 
conceptualizing  the  notion  of  QoL  and  its  role  in  health  care  planning.  Our  ap¬ 
proach  could  be  of  relevance  to  other  scenarios  such  as  educational,  psychomet¬ 
ric.  and  sociometric  testing,  marketing,  sports  science,  and  quality  assessment. 

Keywords  and  Phrases:  Health  care  planning,  hierarchical  modeling,  infor¬ 
mation  integration,  survival  analysis,  quality  control,  utility  theory 


26.1  Introduction  and  Overview 

A  general  perspective  on  the  various  aspects  of  the  QoL  problem  can  be  gained 
from  the  three-part  paper  of  Fitzpatrick  et  al.  (1992).  For  an  appreciation  of  the 
statistical  issues  underlying  QoL,  the  recent  book  by  Mesbah,  et  al.  (2002)  is  a 
good  starting  point.  In  the  same  vein  is  the  paper  of  Cox  et  al.  (1992)  with  the 
striking  title,  “Quality  of  Life  Assessment:  Can  We  Keep  It  Simple?”  Reviewing 
the  above  and  other  related  references  on  this  topic,  it  is  our  position  that  QoL 
assessment  can  possibly  be  kept  simple,  but  not  too  simple!  To  get  a  sense  as 
to  why  we  come  upon  this  view,  we  start  by  selectively  quoting  phrases  from 
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(i)  “Many  instruments  reflect  the  nmltidimensionality  of  QoL,”  Fitzpatrick 
et  aL  (1992). 

(j)  “Summing  disparate  dimensions  is  not  recommended,  because  contrary 
trends  for  different  aspects  of  QoL  are  missed,”  Fitzpatrick  et  al.  (1992). 

( k )  “In  health  economics  QoL  measures  have  . . .  more  controversially  (be¬ 
come)  the  means  of  prioritizing  funding,”  Fitzpatrick  et  al.  (1992). 

( l )  “The  best  understood  application  of  QoL  measures  is  in  clinical  trials, 
where  they  provide  evidence  of  the  effects  of  interventions,”  Fitzpatrick 
et  al.  (1992). 

There  is  a  variant  of  the  notion  of  QoL,  namely,  the  quality  adjusted  life 
(QAL).  This  variant  is  designed  to  incorporate  the  QoL  notion  into  an  anal¬ 
ysis  of  survival  data  and  history.  A  motivation  for  introducing  QAL  has  been 
the  often  expressed  view  that  medical  interventions  may  prolong  life,  but  that 
the  discomfort  that  these  may  cause  could  offset  any  increase  in  longevity.  The 
following  four  quotes  provide  some  sense  of  the  meaning  of  QAL. 

(m)  “QAL  is  an  index  combining  survival  and  QoL...,”  Fitzpatrick  et  al. 
(1992). 

( n )  “QAL  is  a  measure  of  the  medical  and  psychological  adjustments  needed 
to  induce  an  affordable  QoL  for  patients  undergoing  problems,”  Sen  (2002). 

(o)  “QAL  is  a  patients’  survival  time  weighted  by  QoL  experience  where  the 
weights  are  based  on  utility  values  -  measured  on  the  unit  interval,”  Cole 
and  Kilbridge  (2002) 

(p)  “QAL  has  emerged  as  an  important  yardstick  in  many  clinical  studies; 
this  typically  involves  the  lifetime  as  the  primary  endpoint  with  the  in¬ 
corporation  of  QAL  or  QoL  measures  through  appropriate  utility  scores 
that  are  obtained  through  appropriate  item  analysis  schemes,”  cf.  Zhao 
and  Tsiatis  (2000). 

26.1.2  Overview  of  this  chapter 

The  above  quotes  encapsulate  the  essence  of  the  QoL  and  its  variant,  the  QAL. 
They  indicate  the  diverse  constituencies  that  are  attracted  to  a  QoL  metric  and 
the  controversies  that  each  constituency  raises.  For  our  objectives,  the  quotes 
provide  ingredients  for  proposing  a  definition  of  QoL  and  developing  a  metric  for 
measuring  it.  As  a  first  step,  it  appears  to  us  that  any  satisfactory  discourse  on 
QoL  should  encompass  the  involvement  of  three  interest  groups,  the  clinicians, 
the  patients  (or  their  advocates),  and  an  economic  entity,  such  as  managers  of 
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ITs  Utility  6{d.c) 


Figure  26.1.  P’s  decision  tree  using  QAL  consideration  (the  unifying  perspective 
of  QAL). 

The  quantities  0(P),  (9(C),  and  0(T>)  are  explained  later  in  Sections  26.3 
through  26.5.  The  hexagon  denotes  P's  decision  node  and  the  triangle  is  a 
random  node  7Z.  At  the  decision  node  P  takes  one  of  several  possible  actions 
available  to  P;  let  these  actions  be  denoted  by  a  generic  d  At  7 Zy  we  would  see 
the  possible  outcomes  of  decision  d .  The  quantity  U(dy  c)  at  the  terminus  of  the 
tree  represents  to  P  the  utility  of  a  decision  d  when  the  outcome  is  c.  With 
medical  decisions  it  is  often  the  case  that  d  influences  c. 

The  quantity  Q{T>)  is  P’s  QoL  assessed  by  P  subsequent  to  fusing  the  inputs 
of  V  and  C ;  Q{V)  e  [0,1].  Let  P(X  >  x)  denote  P’s  survival  function ;  this  is 
assessed  via  survival  data  history  on  individuals  judged  exchangeable  with  P, 
plus  other  covariate  information  that  is  specific  to  P.  Together  with  P(X  >  x) 
and  0(P)j  P  is  able  to  assess  P’s  QAL.  There  are  two  strategies  for  doing  this. 
One  is  through  the  accelerated  life  model  whereby  QAL(x)  =  P(XQ(T>)  > 
x).  The  other  is  via  a  proportional  Life  model  whereby  QAL(x)  =  (P(X  > 
Note  that  the  QAL  metric  is,  like  the  survival  function,  indexed  by 
x.  The  effect  of  both  of  the  above  is  to  dampen  the  survival  function  of  the 
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Figure  26.2.  Envelope  showing  the  range  of  values  for 

such  omnibus  questions  generate  a  response  on  a  multinomial  scale,  but  here  we 
assume  that  P’s  response  takes  values  in  the  continuum  [0, 1|,  with  1  denoting 
excellent.  Let  0(V)  denote  P’s  response  to  an  omnibus  question. 

26. 3.1  The  case  of  a  single  dimension:  'Dy s  assessment  of  Oj 

Given  the  responses  Xj  —  (xi ,xj jy)  to  a  set  of  k  questions  pertaining  to 
dimension  j,  the  likelihood  of  Oj  and  fij  —  ( (5X . , . . .  ,/?*  )  under  the  Rasch  model 
is 


(26.1) 


for  Oj  e  (0, 1]  and  — oo  <  j3Xj  <  •  •  •  <  j3kj  <  +oo. 

If  we  suppose,  as  is  reasonable  to  do  so,  that  Oj  and  (5j  are  a  priori  inde¬ 


pendent  with  7r(0j)  and  n{fij)  denoting  their  respective  prior  densities,  then 
by  treating  /3j  as  a  nuisance  parameter  and  integrating  it  out,  the  posterior 
distribution  of  Oj  is 


(26.2) 


The  question  now  arises  as  to  what  should  it  (Oj)  and  i r(j3j)  be?  In  order  to 

answer  this  question  we  first  need  to  ask  who  specifies  these  priors,  V ,  C ,  or 
VI  The  answer  has  to  be  either  C  or  P,  because  V  cannot  satisfy  a  prior  and 
then  respond  to  a  questionnaire.  Furthermore,  in  principle,  these  priors  have  to 
be  P’s  priors  because  it  is  P’s  decision  process  that  we  are  describing.  Thus, 


A  Bayesian  Ponders  {iThe  Quality  of  Life* 


377 


The  quantity  JC(0(V)-i0(P)y  6(C))  denotes  £>’s  likelihood  that  V  will  declare 
a  0(P)t  and  C  will  declare  a  0(C) ,  were  0(V)  to  be  a  measure  of  Pys  overall 
quality  of  life.  This  likelihood  will  encapsulate  any  biases  that  V  and  C  may 
have  in  declaring  their  0(P)  and  0(C) ,  respectively,  as  perceived  by  D,  and  also 
any  correlations  between  the  declared  values  by  V  and  C.  The  nature  of  this 
likelihood  remains  to  be  investigated.  The  quantity  7T£>(0(jD))  is  V) s  prior  for 
0(V ),  and  following  our  previous  convention,  we  assume  that  it  is  uniform  on 
[0, 1].  This  completes  our  discussion  on  D’s  assessment  of  0(T>).  It  involves  a 
0(V)y  0(C)  and  connotes  information  integration  by  T>  at  one  level. 

26.4.2  Encoding  the  positive  dependence  between  the  Ojs 

One  way  to  capture  the  positive  dependence  between  the  0jS  is  through  mixtures 
of  independent  sequences.  Specifically,  we  suppose,  as  if  is  reasonable  to  do  so, 
that  given  6(V)  the  OjS  are  independent,  with  6j  having  a  probability  density 
function  of  the  form  fv(0j\0(^)),  J  =  The  subscript  V  associated 

with  /  denotes  the  fact  that  the  probability  density  in  question  is  that  of  V.  A 
strategy  for  obtaining  fv(0j\0(B))  is  described  later,  subsequent  to  Equation 
(26.5). 

With  7 rp(0jiXj)}  j  —  and  TTp((J(V))  at  hand,  V  may  extend  the 

conversation  to  0(22)  and  obtain  the  joint  distribution  of  6\ , . . . ,  9m  as 

f*v(0u .  - .  >0m; Xji, ...  }Xjm,0(P),  0(C)) 

=  J  P(6l,...,6m\0(Vy,xu...,xm)nvm)d6(V);  (26.4) 

6(V) 

in  writing  out  the  above,  we  have  assumed  that  the  xys,  j  =  I,...  ,m,  have 
no  bearing  on  0(V)t  once  B(V)  and  0(C)  have  been  declared  by  V  and  C,  re¬ 
spectively.  Applying  the  multiplication  rule,  and  supposing  that  the  x»s,  i  ^  j 
have  no  bearing  on  0j7  j  =  1, . . .  ,m,  the  right  hand  side  of  the  above  equation 
becomes 

m 

J  I] fv(6}\0(V);x})nv(8(V))dO(V).  (26.5) 

s(v) J=1 

We  now  invoke  Bayes’  law  to  write 

f-[>(8j\6(D);Xj)  cc  fv(eCD)\0j;^Uv(6j\xj), 

where  /x>(0(I>)|0y;  Xj)  is  Z>’s  probability  density  of  0(V)  were  V  to  know  0y, 
and  in  the  light  of  x3»  A  strategy  for  specifying  this  probability  density  is  to 
suppose  that  0(T>)  is  uniform  and  symmetric  around  6V  with  endpoints  0j  4 :  e, 
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There  could  be  other  possible  ways  for  defining  QoL.  A  few  of  these  would 
be  to  consider  mi nj(9j)}  ma or  meanj(0j)i  and  to  let  QoL  be  a  quantity 
such  as 

QoL  =  Pp(min (Bj)  >  a) 

for  some  a  e  [0,1].  Whereas  the  proposed  definition(s)  are  appropriate  in  all 
situations,  it  is  not  clear  whether  a  unique  definition  of  QoL  is  palatable  to  all 
constituents.  We  see  some  merits  to  having  a  unique  yardstick. 


26.6  Summary  and  Conclusions 

In  this  chapter  we  have  proposed  an  approach  for  addressing  a  contemporary 
problem  that  can  arise  in  many  scenarios,  the  one  of  interest  to  us  coming  from 
the  health  sciences  vis-a-vis  the  notion  of  “quality  of  life.51  What  seems  to  be 
common  to  these  scenarios  is  information  from  diverse  sources  that  needs  to 
be  integrated,  considerations  of  multidimcnsionality,  and  the  need  to  make  de¬ 
cisions  whose  consequences  are  of  concern.  Previous  work  on  problems  of  this 
type  has  been  piecemeal  with  statisticians  mainly  focusing  on  the  frequentist. 
aspects  of  item  response  models.  Whereas  such  approaches  have  the  advantages 
of  “objectivity”,  they  do  not  pave  the  path  of  integrating  information  from  mul¬ 
tiple  sources.  The  approach  of  this  chapter  is  based  on  a  hierarchical  Bayesian 
architecture.  In  principle,  our  architecture  is  able  to  do  much,  if  not  all,  that  is 
required  by  the  users  of  QoL  indices  The  architecture  also  leads  to  a  strategy 
by  which  QoL  can  be  defined  and  measured  in  a  formal  manner.  The  current 
literature  on  this  topic  does  not  address  the  matter  of  definition.  This  chapter  is 
expository  in  the  sense  that  it  outlines  an  encompassing  and  unifying  approach 
for  addressing  the  QoL  and  QAL  problem.  The  normative  development  of  this 
chapter  has  the  advantage  of  coherence.  However,  this  coherence  is  gained  at 
the  cost  of  simplicity.  Some  multidimensional  priors  with  a  restricted  sample 
space  are  involved,  and  these  remain  to  be  articulated.  So  do  some  likelihoods. 
Finally,  there  is  the  matter  of  computations  However,  all  these  limitations  are 
only  of  a  technical  nature  and  these  can  eventually  be  addressed.  We  are  con¬ 
tinuing  our  work  on  such  matters,  including  an  application  involving  real  data 
and  real  scenarios.  The  purpose  of  this  chapter  was  to  show  how  a  Bayesian 
approach  can  address  a  contemporary  problem,  and  the  overall  strategy  that 
can  be  used  to  develop  such  an  approach.  The  novel  aspects  of  this  chapter 
are:  the  conceptualization  of  the  QoL  problem  as  a  scenario  involving  three 
groups  of  individuals,  a  structure  whereby  information  from  several  sources  can 
be  integrated,  and  a  definition  of  the  notion  of  QoL. 
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Choosing  a  Coverage  Probability  for  Prediction  Intervals 


Joshua  LANDON  andNozer  D.  SlNGPURWALLA 


Coverage  probabilities  for  prediction  intervals  are  germane  to 
filtering,  forecasting,  previsions,  regression,  and  time  series 
analysis.  It  is  a  common  practice  to  choose  the  coverage  proba¬ 
bilities  for  such  intervals  by  convention  or  by  astute  judgment. 
We  argue  here  that  coverage  probabilities  can  be  chosen  by  de¬ 
cision  theoretic  considerations  But  to  do  so,  we  need  to  spec- 
'  ify  meaningful  utility  functions  Some  stylized  choices  of  such 
functions  are  given,  and  a  prototype  approach  is  presented. 

KEY  WORDS:  Confidence  intervals;  Decision  making;  Filter¬ 
ing;  Forecasting;  Previsions;  Time  series;  Utilities. 


1.  INTRODUCTION  AND  BACKGROUND 

Prediction  is  perhaps  one  of  the  most  commonly  undertaken 
activities  in  the  physical,  the  engineering,  and  the  biological  sci¬ 
ences  In  the  econometric  and  the  social  sciences,  prediction 
generally  goes  under  the  name  of forecasting ,  and  in  the  actuar¬ 
ial  and  the  assurance  sciences  under  the  label  life-length  assess¬ 
ment.  Automatic  process  control,  filtering,  and  quality  control, 
are  some  of  the  engineering  techniques  that  use  prediction  as  a 
basis  of  their  modus  operandus. 

Statistical  techniques  play  a  key  role  in  prediction,  with  re¬ 
gression,  time  series  analysis,  and  dynamic  linear  models  (also 
known  as  state  space  models)  being  the  predominant  tools  for 
producing  forecasts.  The  importance  of  statistical  methods  in 
forecasting  was  underscored  by  Pearson  (1920)  who  claimed 
that  prediction  is  the  “fundamental  problem  of  practical  statis¬ 
tics.”  Similarly,  with  de  Finetti  (1972,  Chaps.  3  and  4),  who 
labeled  prediction  as  “prevision,”  and  made  it  the  centerpiece 
of  his  notion  of  “exchangeability”  and  a  subjectivistic  Bayesian 
development  around  it  In  what  follows,  vve  find  it  convenient  to 
think  in  terms  of  regression,  time  series  analysis,  and  forecast¬ 
ing  techniques  as  vehicles  for  discussing  an  important  aspect  of 
prediction. 


Joshua  Landon  is  Post  Doc,  and  Nozer  D.  Singpurwalla  is  Professor,  Depart¬ 
ment  of  Statistics  and  Department  of  Decision  Sciences,  The  George  Washing¬ 
ton  University,  Washington,  DC  20052  (E-mail:  nozer@gwu.edu).  Supported 
by  ONR  Contract  N000 14-06-1-0037  and  the  ARO  Grant  W91  INF-05- 1-0209. 
The  student  retention  problem  was  brought  to  our  attention  by  Dr  Donald 
Lehman.  The  detailed  comments  of  three  referees  and  an  Associate  Editor  have 
broadened  the  scope  of  the  article.  Professor  Fred  Joust  made  us  aware  of  the 
papers  by  Granger,  and  by  Tay  and  Wallis. 


We  start  by  noting  that  inherent  to  the  above  techniques  is 
an  underlying  distribution  (or  error)  theory,  whose  net  effect 
is  to  produce  predictions  with  an  uncertainty  bound;  the  nor¬ 
mal  (Gaussian)  distribution  is  typical.  An  exception  is  Gard¬ 
ner  (1988),  who  used  a  Chebychev  inequality  in  lieu  of  a  spe¬ 
cific  distribution.  The  result  was  a  prediction  interval  whose 
width  depends  on  a  coverage  probability;  see,  for  example,  Box 
and  Jenkins  (1976,  p.  254),  or  Chatfield  (1993).  It  has  been  a 
common  practice  to  specify  coverage  probabilities  by  conven¬ 
tion,  the  90%,  the  95%,  and  the  99%  being  typical  choices.  In¬ 
deed  Granger  (1996)  stated  that  academic  writers  concentrate 
almost  exclusively  on  95%  intervals,  whereas  practical  fore¬ 
casters  seem  to  prefer  50%  intervals.  The  larger  the  coverage 
probability,  the  wider  the  prediction  interval,  and  vice  versa.  But 
wide  prediction  intervals  tend  to  be  of  little  value  [sec  Granger 
(1996),  who  claimed  95%  prediction  intervals  to  be  “embarass- 
ingly  wide”]  By  contrast,  narrow  prediction  intervals  tend  to 
be  risky  in  the  sense  that  the  actual  values,  when  they  become 
available,  could  fall  outside  the  prediction  interval.  Thus,  the 
question  of  what  coverage  probability  one  should  choose  in  any 
particular  application  is  crucial. 

1.1  Objective 

The  purpose  of  this  article  is  to  make  the  case  that  the  choice 
of  a  coverage  probability  for  a  prediction  interval  should  be 
based  on  decision  theoretic  considerations.  This  would  boil 
down  to  a  trade-off  between  the  utility  of  a  narrow  interval  ver¬ 
sus  the  disutility  of  an  interval  that  fails  to  cover  an  observed 
value  It  is  hoped  that  our  approach  endows  some  formality  to  a 
commonly  occurring  problem  that  seems  to  have  been  tradition¬ 
ally  addressed  by  convention  and  judgment,  possibly  because 
utilities  are  sometimes  hard  to  pin  down. 

1.2  Related  Issues 

Before  proceeding,  it  is  important  to  note  that  in  the  context 
of  this  article,  a  prediction  interval  is  not  to  be  viewed  as  a  confi¬ 
dence  interval.  The  former  is  an  estimate  of  a  future  observable 
value;  the  latter  an  estimate  of  some  fixed  but  unknown  (and  of¬ 
ten  unobservable)  parameter.  Prediction  intervals  are  produced 
via  frequentist  or  Bayesian  methods,  whereas  confidence  inter¬ 
vals  can  only  be  constructed  via  a  frequentist  argument.  The  dis¬ 
cussion  of  this  article  revolves  around  prediction  intervals  pro¬ 
duced  by  a  Bayesian  approach;  thus  we  are  concerned  here  with 
Bayesian  prediction  intervals.  For  an  application  of  frequentist 
prediction  intervals,  the  article  by  Lawless  and  Fredette  (2005) 
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is  noteworthy;  also  the  book  by  Hahn  and  Meeker  (1991,  Sect. 
2.3),  or  the  article  of  Beran  (1990). 

A  decision  theoretic  approach  for  specifying  the  confidence 
coefficient  of  a  confidence  interval  is  not  explored  here.  All  the 
same,  it  appears  that  some  efforts  in  this  direction  were  em¬ 
barked  upon  by  Lindley  and  Savage  [see  Savage  (1962),  p  173, 
who  also  alluded  to  some  work  by  Lehmann  (1958)]  By  con¬ 
trast,  a  decision  theoretic  approach  for  generating  prediction  in¬ 
tervals  has  been  alluded  to  by  Tay  and  Wallis  (2000)  and  devel¬ 
oped  by  Winkler  (1972).  However,  Winkler’s  aim  was  not  the 
determination  of  optimal  coverage  probabilities,  even  though 
the  two  issues  of  coverage  probability  and  interval  size  are  iso¬ 
morphic.  Our  focus  on  coverage  probability  is  dictated  by  its 
common  use  m  regression,  time  series  analysis,  and  forecast¬ 
ing. 

Finally,  predictions  and  prediction  intervals  should  not  be 
seen  as  being  specific  to  regression  and  time  series  based  mod¬ 
els.  In  general  they  will  anse  in  the  context  of  any  probability 
models  used  to  make  previsions,  such  as  the  ones  used  in  relia¬ 
bility  and  survival  analysis  [see  Singpurwalla  (2006),  Chap.  5]. 

2.  MOTIVATING  EXAMPLE 

Our  interest  in  this  problem  was  motivated  by  the  following 
scenario.  For  purposes  of  exposition,  we  shall  anchor  on  this 
scenario.  r 

•  A  university  wishes  to  predict  the  number  of  freshman  stu¬ 
dents  that  will  be  retained  to  their  sophomore  year.  Suppose  that 
N  is  the  number  of  freshman  students,  and  X  is  the  number  re¬ 
tained  to  the  sophomore  year,  X  <  N.  Knowing  N ,  the  univer¬ 
sity  wishes  to  predict  X.  The  prediction  is  to  be  accompanied 
by  a  prediction  interval,  and  the  focus  of  this  article  pertains  to 
the  width  of  the  interval.  The  width  of  the  interval  determines 
the  amount  of  funds  the  university  needs  to  set  aside  for  meet¬ 
ing  the  needs  of  the  sophomore  students.  The  wider  the  interval, 
the  greater  the  reserves;  however,  large  reserves  strain  the  bud¬ 
get.  By  contrast,  the  narrower  the  interval  the  greater  is  the  risk 
of  the  actual  number  of  sophomores  falling  outside  the  inter¬ 
val.  This  would  result  in  poor  budgetary  planning  due  to  insuf¬ 
ficient  or  excessive  reserves.  Thus,  a  trade-off  between  the  risks 
of  over-budgeting  and  under-budgeting  is  called  for. 

The  student  retention  scenario  is  archetypal  because  it  anses 
in  several  other  contexts  under  different  guises.  A  direct  parallel 
arises  in  the  case  of  national  defense  involving  an  all-volunteer 
fighting  force.  Meaningful  predictions  of  the  retention  of  trained 
personnel  are  a  matter  of  national  security.  A  more  classical  sce¬ 
nario  is  the  problem  of  inventory  control  wherein  a  large  volume 
of  stored  items  ties  up  (capital,  whereas  too  little  inventory  may 
result  in  poor  customer  satisfaction  or  emergency  actions;  see, 
for  example,  Hadley  and  Whi tin  (1960,  Chap  4).  Another  (more 
contemporary)  scenario  comes  from  the  Basel  II  accords  of  the 
banking  industry.  Bank  regulators  need  to  assess  how  much  cap¬ 
ital  a  bank  needs  to  set  aside  to  guard  against  financial  risks  that 
a  bank  may  face;  see  Decamps,  Rochet,  and  Roger  (2004)  for  an 
appreciation.  From  the  biomedical  and  the  engineering  sciences 
arises  the  problem  of  predicting  survival  times  subsequent  to  a 
major  medical  intervention  or  a  repair. 


In  all  the  above  scenarios,  the  width  of  the  prediction  interval 
is  determined  by  the  nature  of  an  underlying  probability  model 
and  its  coverage  probability.  This  point  is  best  illustrated  by  a 
specific  assumption  about  the  distribution  of  the  unknown  X , 
this  is  done  next.  But  before  doing  so,  it  is  necessary  to  remark 
that  neither  the  literature  on  inventory  control,  nor  that  on  Basel 
II  accords,  addresses  the  issue  of  optimal  coverage  probabilities. 
In  the  former  case,  a  possible  reason  could  be  the  difficulties 
associated  with  quantifying  customer  dissatisfaction. 


2.1  Distributional  Assumptions 

Suppose  that  the  (posterior)  predictive  distribution  of  X  ob¬ 
tained  via  a  regression  or  a  time  series  model  is  a  normal  (Gaus¬ 
sian)  with  a  mean  fx  and  variance  a2,  where  fx  and  n1  have 
been  pinned  down;  the  normal  distribution  is  typical  in  these 
contexts.  Then,  it  is  well  known  [see  De  Groot  (1970),  p.  228] 
that  under  a  squared  error  loss  for  prediction  error,  fx  is  the  best 
predictor  of  X.  For  a  coverage  probability  of  (1  —  a),  a  predic¬ 
tion  interval  for  X  may  be  of  the  form  (x  ±  za/20 .  Here  zaj7  is 
such  that  for  some  random  variable  W  having  a  standard  normal 
distribution,  P(W  >  za/i )  =  a/2. 

The  question  that  we  wish  to  address  in  this  article  is,  what 
should  a  be?  A  small  a  will  widen  the  prediction  interval  dimin¬ 
ishing  its  value  to  a  user  Indeed,  a  =  0  will  yield  the  embar¬ 
rassing  (— oo,  Too)  as  a  prediction  interval  By  contrast,  with 
large  values  of  a,  one  runs  the  risk  of  the  prediction  interval  not 
covering  the  actual  value  (when  it  materializes).  Thus,  we  need 
to  determine  an  optimum  value  of  a  to  use.  To  address  the  ques¬ 
tion  posed,  we  need  to  introduce  utilities,  one  for  the  worth  of  a 
prediction  interval,  and  the  other,  a  disutility,  for  the  failure  of 
coverage. 


3.  CANDIDATE  UTILITY  FUNCTIONS 

Utilities  are  a  key  ingredient  of  decision  making,  and  the  prin¬ 
ciple  of  maximization  of  expected  utility  prescribes  the  deci¬ 
sion  (action)  to  be  taken;  see,  for  example,  Lindley  (1985,  p. 
71).  Utilities  measure  the  worth  of  a  consequence  to  a  deci¬ 
sion  maker,  and  disutilities  tbe  penalty  (or  loss)  imposed  by  a 
consequence  With  disutilities,  a  decision  maker’s  actions  are 
prescribed  by  the  principle  of  minimization  of  expected  disutil¬ 
ities.  The  unit  of  measurement  of  utilities  is  a  ‘"utile.”  However, 
in  practice  utilities  are  measured  in  terms  of  monetaiy  units, 
such  as  dollars,  and  this  is  what  we  shall  assume. 

In  the  context  of  prediction,  we  make  the  natural  assump¬ 
tion  that,  in  principle,  one  prefers  a  prediction  interval  of  width 
zero  over  any  other  prediction  interval.  This  makes  the  utility  of 
any  prediction  interval  of  nonzero  width  a  disutility.  Similarly, 
the  failure  of  any  prediction  interval  to  cover  an  observed  value 
results  in  a  disutility.  Following  Winkler  (1972),  the  two  disu¬ 
tilities  mentioned  above  are  assumed  to  be  additive,  though  this 
need  not  be  so.  Thus,  for  the  scenario  considered  here,  one  en¬ 
deavors  to  choose  that  value  of  a  for  which  the  total  expected 
disutility  is  a  minimum. 
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Figure  1.  The  disutility  of  noncoverage. 

3.1  The  Disutility  of  a  Prediction  Interval 

The  width  da  of  a  prediction  interval  of  the  type  described 
in  Section  2.1  is  da  —  2zaj2V\  here  the  coverage  probability  is 
(1  —  a).  Let  c(da)  be  the  disutility  (or  some  kind  of  a  dollar 
penalty)  associated  with  a  use  of  da.  Clearly  c(da)  should  be 
zero  when  da  —  0,  and  c(da)  must  increase  with  da ,  since  there 
is  a  disadvantage  to  using  wide  intervals.  A  possible  choice  for 
c(da)  could  be 

c{da)=d i,  a) 

for  P  >  0.  When  p  <  1,  c{da)  is  a  concave  increasing  func¬ 
tion  of  da ,  and  when  P  >  lf  c(da)  is  convex  and  increasing  in 
da .  The  choice  of  what  p  must  be  depends  on  the  application.  In 
certain  applications,  such  as  target  tracking,  p  <  1  may  be  more 
desirable  than  p  >  1;  in  others,  such  as  econometric  forecast¬ 
ing,  a  convex  disutility  function  may  be  suitable.  The  choice  of 
(3.1)  for  a  disutility  function  is  purely  illustrative.  The  proposed 
approach  is  not  restricted  to  any  particular  choice  for  c(da ). 

3.2  The  Disutility  of  Noncoverage 

A  possible  function  for  the  disutility  caused  by  a  failure  of 
the  prediction  interval  to  cover  x,  a  realization  of  Xy  can  be 
prescribed  via  the  following  line  of  reasoning. 

Suppose  that  Ua  =  p  +  za/2cr  is  the  upper  bound,  and  La  = 
p  ~-Za/2&,  the  lower  bound  of  the  (1  —a)  probability  of  coverage 
prediction  interval.  Let  L  ( da ,  x)  denote  the  disutility  or  penalty 
loss  (in  dollars)  in  using  a  prediction  interval  of  width  da  when 
X  reveals  itself  as  x.  Then  L(da ,  x)  could  be  of  the  form 

f\(x-  Ua),  x  >  Ua> 


value  of  x  is  not  known  and  thus  L{da>x)  needs  to  be  aver¬ 
aged  over  the  possible  values  that  x  can  take.  This  is  easy  to 
do  because  the  predictive  distribution  of  X  has  to  be  specified 
Accordingly,  let 

R(da)  =  Ex[L(dayx)]y  (3) 

be  the  expected  value  of  L(da,  x).  In  decision  theory,  R(da)  is 
known  as  the  risk  function ;  it  is  free  of  X.  R(da)  encapsulates 
the  risk  of  noncoverage  by  an  interval  of  width  da>  with  R(da) 
decreasing  in  da- 

Since  c(da)  is  devoid  of  unknown  quantities— indeed  da  is  a 
decision  variable — the  matter  of  taking  an  expectation  of  c(da) 
is  moot.  We  may  now  combine  c(da)  and  R(da)  to  obtain  the 
total  expected  disutility  function  as 

D(da)  =  c(da)  +  R(da).  (4) 

As  mentioned  before,  the  additive  choice,  albeit  natural,  is  not 
binding.  We  choose  that  value  of  a  for  which  D(da)  is  a  mini¬ 
mum  This  is  described  next. 

4.  CHOOSING  AN  OPTIMUM  COVERAGE 
PROBABILITY 

To  make  matters  concrete,  suppose  that  c(da)  —  V^a,  so 
that  tlie  p  of  Equation  (1)  is  1/2.  Also,  since  da  —  2za/20, 
Ua  p  +zu/ 2°  can  be  written  as  Un  —  p  +  da/ 2;  similarly, 
La  —  p  —  daJ7 . 

For  the  j\  and  fz  of  Fquation  (2),  we  let  /j(x  —  Ua)  — 
(x  —  Ua)2 / 40  and  /2(La  —  x)  —  ( La  —  x)2/10.  These  choices 
encapsulate  a  squared-error  disutility,  and  make  f\  and  fz 
asymmetric  with  respect  to  each  other  Writing  Ua  and  La  in 
terms  of  day  we  have  fi(x  —  Ua)  =  (x  —  p  —  da/2)2/40y  and 
fi(La-x)  =  (v-da/2-x)2/\0. 

To  compute  the  risk  function  of  Equation  (3)  we  need  to 
specify  p  and  a 2  of  the  normal  distribution  of  X.  Based  on  a 
Bayesian  time  series  analysis  of  some  student  retention  data, 
these  were  determined  to  be  p  —  2140  and  o'1  —  396.  With  the 
above  in  place,  we  may  compute  the  total  expected  disutility  as 


L(da>x)  = 


0, 


i>  a  ^  X  <  Ua  >  (2) 


D(da)  -  Jd^+  R(da), 


fl(.La  —  ■*)>  X  <  Lat 

where  f\  and  fz  are  increasing  functions  of  their  arguments, 
which  encapsulate  the  penalty  of  x  overshooting  and  under¬ 
shooting  the  prediction  interval,  respectively. 

As  illustrated  in  Figure  1,  the  said  functions  will  generally 
be  convex  and  increasing  because  a  narrow  miss  by  the  interval 
will  matter  less  than  a  large  miss.  Furthermore,  these  functions 
need  not  be  symmetric.  For  example,  as  shown  in  Figure  1,  the 
penalty  for  undershooting  the  interval  is  assumed  to  be  more 
severe  than  that  of  overshooting. 

33  The  Expected  Total  Disutility 

With  c{da)  and  L{dajx)  thus  specified,  there  remains  one 
caveat  that  needs  to  be  addressed.  When  the  a  is  chosen,  the 


where 


R{da)  =  J 

n+da!  2 


(*  -  R  -  dg_py 

40 


f{x)dx 


»-d«n 


+ 


I 


(y-dj  2-xf 
10 


f(x)dxy 


where  / (x)  is  the  probability  density  at  x  of  a  normally  dis¬ 
tributed  random  variable  with  mean  p  and  variance  rr2. 

The  computation  of  R(da)  has  to  be  done  numerically,  and  a 
plot  of  D{da)  versus  dat  for  da  >  0,  is  shown  in  Figure  2. 

An  examination  of  Figure  2  shows  that  D{da)  attains  its  min¬ 
imum  at  da  =62.  This  suggests,  via  the  relationship  da  = 
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Figure  2.  Total  expected  disutility  versus  da. 

2za/2<7  with  o1  =  396,  and  a  tabic  look  up  in  the  standard 
normal  distribution,  that  the  optimal  coverage  probability  for 
this  scenario  is  0.88.  Using  coverage  probabilities  other  than 
0.88  0.90,  say  the  conventional  0.95  or  0.99  would  yield  a 

wider  interval  but  the  utility  of  such  intervals  would  be  less  than 
that  provided  by  the  0.90  coverage  probability. 

5.  GENERALITY  OF  THE  APPROACH  AND  SOME 
CAVEATS 

The  proposed  approach  is  general  because  it  rests  on  the  sim¬ 
ple  principle  of  minimizing  D(da),  the  total  expected  disutility 
function — Equation  (4).  If  D(da )  attains  a  unique  minimum, 
then  a  unique  optimal  coverage  probability  can  be  arrived  upon. 
If  the  minimum  is  not  unique,  then  several  optimal  coverage 
probabilities  will  result,  and  the  user  is  free  to  choose  any  one 
of  these.  There  could  be  circumstances  under  which  D{da)  will 
not  attain  a  minimum,  and  the  method  will  fail  to  produce  an 
answer.  The  optimality  conditions  which  ensure  a  minimum 
value  of  D(da)  is  a  matter  that  needs  to  be  formally  addressed, 
but  with  c(da)  monotonic  and  concave  (or  convex),  and  with 
L(da>x)  U-shaped  as  shown  in  Figure  1,  D(da)  will  indeed 
attain  a  minimum.  The  choice  of  L(da,x)  prescribed  in  Equa¬ 
tion  (1)  is  quite  general.  It  is  easily  adaptable  to  one-sided  inter¬ 
vals,  and  also  to  the  inventory  and  banking  scenarios  mentioned 
before.  Furthermore,  it  is  conventional  in  life-length  prediction 
studies  and  in  statistical  inference  wherein  square  error  loss  is  a 
common  assumption. 

The  assumed  distribution  of  X  with  specified  parameters 
plays  two  roles.  One  is  to  average  out  L(da,x)  to  produce 
the  risk  function  R(da).  In  this  role  the  choice  of  the  distri¬ 
bution  of  X  is  not  restrictive  because  its  purpose  here  is  to 
merely  serve  as  a  weighting  function.  Any  well-known  distri¬ 
bution  can  be  used,  especially  when  R(da)  is  obtained  via  nu¬ 
merical  methods,  as  we  have  done  with  the  normal.  By  contrast, 
frequentist  prediction  intervals  that  entail  pivitol  methods  limit 
the  choice  of  distributions. -The  second  role  played  by  the  dis¬ 
tribution  of  X,  is  to  facilitate  a  relationship  between  da  and  a. 
In  the  case  of  the  normal  distribution  with  mean  [l  and  vari¬ 
ance  o'1,  da  ~  'lzafio\  here  fi  does  not  matter.  This  type  of 
relationship  will  arise  with  any  symmetrical  distribution,  such 
as  the  Student’s-/,  the  triangular,  the  uniform,  the  Laplace,  etc. 
A  relationship  between  da  and  a  in  the  case  of  the  exponen¬ 
tial  with  scale  X  turns  out  to  be  quite  straightforward,  indeed 
more  direct  than  that  encountered  with  the  normal,  specifically 


da  —  jlog[(2  —  a )/a\.  By  suitable  transformations,  the  case  of 
other  skewed  distributions  such  as  the  lognormal,  the  Weibull, 
and  the  chi-squared  can  be  similarly  treated.  A  difficult  case 
in  point  is  the  Pareto  distribution  (popular  in  financial  math¬ 
ematics)  wherein  P(X  >  x\  y/,P)  =  (^/(^  +  x))^.  Here 
da  —  \j/[{  1  -f  a/2)~xfP  —  (a/2)”1^],  and  the  relationship  bc- 
. tween  da  and  a  is  involved  for  the  method  to  be  directly  in¬ 
voked. 

Finally,  besides  the  caveat  of  D(da)  not  having  a  minimum, 
the  other  caveat  is  the  dependence  of  an  optimal  coverage  prob¬ 
ability  on  data.  Specifically,  the  use  of  a  posterior  distribution  of 
X  to  obtain  R(da)  makes  this  latter  quantity  depend  on  the  ob¬ 
served  data  with  the  consequence  that  in  the  same  problem  one 
could  conceivably  end  up  using  a  different  coverage  probability 
from  forecast  to  forecast.  Unattractive  as  this  may  sound,  it  is 
the  price  that  one  must  pay  to  ensure  coherence.  However,  this 
dependence  on  the  data  becomes  of  less  concern  once  the  pos¬ 
terior  distribution  of  X  converges;  so  that  the  effect  of  the  new 
data  on  the  posterior  diminishes.  The  same  situation  will  also 
arise  when  the  distribution  of  X  is  specified  via  a  frequentist 
approach  involving  a  plug-in  rule. 

6.  SUMMARY  AND  CONCLUSIONS 

The  thesis  of  this  article  is  to  argue  that  choosing  coverage 
probabilities  for  prediction  intervals  should  be  based  on  deci¬ 
sion  theoretic  considerations.  The  current  practice  is  to  choose 
these  by  convention  or  astute  judgment.  Prediction  intervals  are 
one  of  the  essentials  of  \  egression,  time  series,  and  state  space 
models.  They  also  occur  in  conjunction  with  previsions  based 
on  probability  models  entailing  the  judgment  of  exchangeabil¬ 
ity.  Furthermore,  the  principles  underlying  the  construction  of 
prediction  intervals  share  some  commonality  with  those  involv¬ 
ing  inventory  planning  and  banking  reserves. 

The  decision  theoretic  approach  boils  down  to  the  minimiza¬ 
tion  of  total  expected  disutility.  This  disutility  consists  of  two 
components.  One  is  a  disutility  associated  with  the  width  of  the 
interval  and  the  other  is  associated  with  the  failure  of  an  inter¬ 
val  to  cover  the  observed  value  when  it  reveals  itself.  The  pro¬ 
posed  approach  is  illustrated  via  a  consideration  of  stylized  util¬ 
ity  functions.  It  can  be  seen  as  a  prototype  for  approaches  based 
on  other  utility  functions.  The  approach  also  entails  a  use  of  the 
normal  distribution  to  describe  the  uncertainties.  Again,  this  dis¬ 
tributional  assumption  is  not  essential;  other  distributions  will 
work  equally  well. 

We  emphasize  that  the  material  here  pertains  to  prediction 
intervals,  not  confidence  intervals.  It  would  be  interesting  to  de¬ 
velop  a  decision  theoretic  approach  for  choosing  the  confidence 
coefficient  of  a  confidence  interval.  To  the  best  of  our  knowl¬ 
edge,  this  remains  to  be  satisfactorily  done. 

[Received  June  2007.  Revised  December  2007.] 
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Abstract 

The  point  of  view  that  we  adopt  here  is  that  damage  is  an  abstract  notion  that  conveys  an  intuitive  import. 
Specifically,  an  item  fails  when  its  damage  exceeds  a  threshold.  Whereas  damage  cannot  be  observed  and 
measured,  the  surrogates  that  it  spawns  such  as  crack  growth,  CD4  cell  counts,  and  wear,  can  be.  These 
surrogates  are  known  as  markers.  With  the  above  in  mind,  we  offer  here  a  probabilistic  architecture  thatenc 
to  make  the  notion  of  damage  precise  and  still  retain  its  intuitive  import.  Our  architecture  looks  at  damage  a: 
cumulative  hazard  and  describes  its  evolution  via  a  nondecreasing  stochastic  process.  The  observable  marl 
associated  with  damage  are  also  modeled  by  a  stochastic  process  that  is  cross-correlated  with  the  damage 
process.  Thus  a  bivariate  stochastic  process  with  one  component  that  is  nondecreasing  and  the  other  that  r 
fluctuate  around  some  mean  could  be  a  suitable  model  for  encapsulating  the  phenomenon  of  damage  and  i 
markers.  The  latent  nondecreasing  process  leads  to  the  fluctuating  observable  processes,  and  an  item  fails 
the  former  hits  a  threshold.  The  second  feature  of  our  architecture  pertains  to  the  threshold.  We  argue  that  t 
threshold  needs  to  be  random  and  has  an  exponential  distribution  with  scale  one;  we  call  such  a  threshold  t 
hazard  potential  of  an  item.  To  conclude,  our  perspective  on  what  constitutes  a  damage  process  is  starkly  d 
from  that  which  is  prevalent  in  the  reliability  and  the  survival  analysis  literatures.  Hopefully,  it  offers  a  platfori 
describing  an  ill-defined  but  much-discussed  phenomenon. 

Keywords:  aging;  crack  growth,  degradation;  deterioration;  fatigue;  gamma  processes;  hazard  potential;  he 
status;  latent  variables;  marker  processes;  quality  of  life;  reliability;  stochastic  integral;  stochastic  processes 
surrogates,  survival  analysis;  wear;  Wiener  process 
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Damage  Processes 

Introduction  and  Background 

There  is  extensive  and  burgeoning  material  on  the 
topic  of  damage  and  its  associated  factors  like 
aging,  cumulative  damage,  degradation,  deterio¬ 
ration,  fatigue,  health  status,  and  quality  of  life. 
This  material  appears  in  both  the  biostatistical  and 
the  engineering  reliability  literatures.  However,  these 
notions  suffer  from  the  feature  that  they  lack  a 
precise  definition.  Rather  they  convey  an  abstract 
but  intuitive  import  in  the  sense  of  a  decrease  in 
residual  (or  remaining)  life.  This  decrease  in  resid¬ 
ual  life  is  conceptualized  via  the  feature  that  an 
item  experiencing  aging  and  degradation  will  fail 
when  the  damage  hits  some  barrier  or  threshold. 
Alternatively,  it  is  supposed  that  at  inception,  every 
item  is  endowed  with  a  resource  that  gets  depleted 
because  of  damage,  and  that  the  item  fails  when  the 
resource  gets  exhausted.  Thus,  for  example,  to  engi¬ 
neers  like  Bogdanoff  and  Kozin  [1],  “Degradation 
is  the  irreversible  accumulation  of  damage  through¬ 
out  life  that  leads  to  failure.”  The  term  damage  is 
not  made  precise;  however  it  is  claimed  that  dam¬ 
age  reveals  itself  via  surrogates  or  markers ,  such  as 
cracks  that  grow  in  size,  corrosion,  measured  wear 
(i.e  ,  depletion  of  material),  and  so  on.  Similarly, 
Sobczyk  [2]  sees  fatigue  as  “a  phenomenon  which 
takes  place  in  units  experiencing  lime-varying  exter¬ 
nal  actions  which  manifest  in  a  deterioration  of  the 
unit’s  resistance  to  carry  its  intended  loading”.  In 
the  biostatistical  literature,  aging  pertains  to  a  unit’s 
position  in  a  state  space  wherein  the  probabilities  of 
failure  are  greater  than  its  former  position.  Aging 
manifests  itself  in  terms  of  biomedical  and  physical 
difficulties  experienced  by  individuals,  and  in  certain 
scenarios,  via  things  like  low-CD4  cell  counts;  these 
serve  as  biomedical  surrogates,  or  what  are  known  as 
biomarkers. 

The  markers  mentioned  above  are,  in  most  cases, 
observable  and  measurable  entities.  Much  of  the 
recent  work  on  what  is  known  as  degradation  model¬ 
ing  centers  around  assessing  lifetimes  via  an  analysis 
of  the  observed  markers  and  their  hitting  times  to 
a  threshold  (cf.  Doksum  [3],  Doksum  and  Normand 
[4],  Lu  and  Meeker  [5],  Ebrahimi  [6],  and  Lehmann 
[7]).  However,  treating  the  observable  markers  as 
substitutes  for  the  unobservable  degradation  process 


that  actually  causes  failure  is  tantamount  to  putting 
the  cart  before  the  horse.  This  is  because  the  unob¬ 
servable  degradation  process  spawns  the  observable 
marker  process,  and  is  therefore  its  cause.  An  excep¬ 
tion,  however,  is  the  work  of  Whitmore  et  al  [8] 
and  of  Lee  et  al  [9],  who  treat  the  degradation  and 
the  marker  as  separate  but  related  processes.  Also, 
see  Nair  [10],  who  makes  the  point  that  data  on  the 
observable  surrogates  of  degradation  help  sharpen 
lifetime  assessments.  In  this  vein,  a  noteworthy  con¬ 
tribution  is  by  Cox  [11]  who  systematically  artic¬ 
ulates  the  roles  that  the  observable  and  the  unob¬ 
servable  play  in  lifetime  assessments.  The  premise 
upon  which  our  bivariate  stochastic  process  model 
with  a  random  threshold  is  based  has  been  inspired 
by  the  papers  of  Whitmore  et  al  [8]  and  Cox  [11], 
and  our  work  on  hazard  potential  (see  Singpurwalla 
[12;  13,  p.  79]). 

Preliminaries:  The  Hazard  Potential 

For  an  appreciation  of  the  bivariate  stochastic  process 
model  as  a  description  of  the  damage  process,  some 
preliminaries  on  the  notion  of  a  hazard  potential 
would  be  helpful.  Accordingly,  let  T  denote  the 
lifetime  of  a  unit  and  let  hit)  be  the  hazard  rate 
of  P(T>f),/>0;  let  7/(0  =  f^hi^du  be  the 
cumulative  hazard  function  at  t.  Then  it  is  easy  to 
see  that 

P{T  >  t ;  hit),  t  >  0) 

=  cxp(—H (t))  =  P(X  >  H (0)  (D 

where  X  has  an  exponential  distribution  with  scale 
one.  The  random  variable  X  is  called  the  hazard 
potential  of  the  item,  and  it  represents  an  unknown 
“resource”  that  the  item  is  endowed  with  at  inception. 
Furthermore  7/(0  is  a  measure  of  the  amount  of 
resource  consumed  by  time  r,  and  hit),  the  rate  at 
which  the  resource  is  consumed  at  /.  The  unit  fails 
when  Hit)  exceeds  X;  that  is  when  Hit)  hits  the 
random  threshold  X. 

When  the  rate  at  which  a  unit’s  resource  gets 
consumed  is  random,  hit)  is  described  by  a  stochas¬ 
tic  process,  making  (7/(0;  t  >  0}  a  stochastic  pro¬ 
cess  as  well.  However,  this  latter  process  has  to 
be  nondecreasing.  The  unit  fails  when  the  process 
{7/(0;  t  >  0}  hits  a  barrier  X,  where  X  is  also 
random  with  a  unit  exponential  distribution.  Can¬ 
didate  stochastic  processes  for  {7/(0;  f>0)  arc 
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also  alluded  to  in  Singpurwalla  [12],  Since  H(t) 
is,  in  principle,  nondecreasing  in  /,  /  >  0,  the  pro¬ 
cess  {//(/);  /  >  0}  is  a  candidate  for  describing 
a  damage  process.  Furthermore,  since  the  conven¬ 
tional  view  claims  that  an  item  fails  when  the  dam¬ 
age  hits  a  threshold,  the  cumulative  hazard  and  the 
damage  reflect  a  parallel  feature.  This  motivates  us 
to  view  the  (cumulative)  damage  as  being  isomor¬ 
phic  with  the  cumulative  hazard.  Doing  so  makes 
our  perspective  different  from  that  which  is  cur¬ 
rently  being  discussed  in  literature.  Like  (cumulative) 
damage,  the  cumulative  hazard  is  not  observable. 
However,  the  cumulative  hazard  does  influence  the 
time  to  failure.  Consequently,  the  cumulative  haz¬ 
ard  and  the  (cumulative)  damage  are  to  be  seen  as 
latent  variables ,  and  for  that  matter,  so  is  the  hazard 
potential  X. 

A  Stochastic  Process  Model  for  Damage 
and  its  Markers 

Because  markers  are  closely  linked  with  damage, 
any  suitable  model  for  the  damage  process  should 
be  accompanied  by  some  sort  of  description  for  the 
markers  as  well.  The  most  general  way  to  do  this 
would  be  to  assume  that  the  markers  are  realiza¬ 
tions  of  stochastic  processes,  just  as  the  (cumulative) 
damage  is  a  stochastic  process.  The  simplest  way 
to  proceed  would  be  to  suppose  that  there  is  only 
one  marker  to  focus  attention  upon,  so  that  a  bivari¬ 
ate  stochastic  process  (//(/),  Z(f);  t  >  0)  would  be  a 
suitable  description  of  the  damage  and  its  marker.  As 
stated  in  the  section  titled  “Preliminaries:  The  Hazard 
Potential”,  the  process  {//(/);  /  >  0}  is  nondecreas¬ 
ing  in  t.  However,  the  process  (Z (f);  /  >  0}  need  not 
be  restricted  to  being  nondecreastng.  Indeed,  markers 
such  as  crack  growth  and  CD4  cell  counts  fluctuate 
around  some  trend,  and  thus  one  is  free  to  choose 
any  suitable  model  for  the  process  {Z(/)i  t  >  0).  A 
Wiener  process  appears  to  be  the  model  of  choice, 
but  this  need  not  be  so. 

Thus  to  summarize,  our  proposed  model  for  the 
(cumulative)  damage  and  its  associated  marker  is 
a  bivariate  stochastic  process  [H(t),  Z(r);  t  >  0} 
with  H{t)  nondecreasing  in  /,  and  Z(r)  free  to 
fluctuate  around  some  constant  or  trend.  We  term 
such  a  process  a  degradation  process.  Since  H(t) 
spawns  Z(/),  the  two  processes  {//(/);  t  >  0}  and 
(Z(/);  t  >  0}  need  to  be  linked;  that  is,  they  need 


to  be  cross-correlated.  Without  such  linkage,  the 
marker  process  cannot  serve  as  predictor  of  failure, 
and  the  statistical  exercise  of  degradation  modeling 
is  not  meaningful.  One  way  to  achieve  this  linkage  is 
to  describe  (Z(r);  /  >  0)  by  a  Wiener  process,  and 
the  unobservable  (cumulative)  damage  process  by  a 
Wiener  maximum  process ,  namely, 

H(t)  as  sup  (Z(s);  5*  >  0}  (2) 

0<f</ 

This  strategy  has  been  proposed  in  Singpurwalla 
[14],  wherein  a  Bayesian  approach  for  inference 
about  lifetimes,  using  data  on  the  marker  process, 
is  also  described.  The  item  fails  when  H(t)  hits 
the  (random)  threshold  X.  Whereas  the  model  of 
equation  (2)  could  be  a  starting  point,  there  is  a 
caveat  that  needs  to  be  addressed.  Specifically,  since 
Z(t)  is  spawned  by  H(t),  the  latter  is  the  cause  of 
the  former.  This  means  that  H(t)  must  lead  Z(0» 
and  so  any  linkage  between  the  two  processes  in 
question  should  incorporate  a  time  lag.  The  model 
of  equation  (2)  does  not  do  this  because  here  H(t) 
is  determined  retrospective  to  Z(t)  and  therefore  lags 
Z(t),  instead  of  the  other  way  around.  Thus  H(t)  and 
Z(/)  need  to  be  connected,  with  the  observable  Z(f) 
lagging  the  unobservable  H(t).  This  is  a  possible 
topic  for  future  research. 

In  the  section  titled  “Candidate  Processes  for 
Damage  and  Markers”,  we  give  an  overview  of 
some  modeling  strategies  that  have  been  proposed 
for  the  damage  process  [H(t)\  t  >  0},  as  well  as 
for  the  marker  process  { Z(f );  /  >  0)  when  each  are 
treated  separately;  that  is  when  no  distinction  is  made 
between  the  damage  (or  degradation)  process  and 
the  marker  process.  Supplementary  material  on  the 
above  can  also  be  found  in  Chapters  7  and  8  of 
Singpurwalla  [13]. 

Candidate  Processes  for  Damage  and 
Markers 

The  origins  of  the  work  on  threshold  crossing  of 
cumulative  damage  as  a  basis  for  failure  goes  back 
to  Epstein  [15],  Esary  [16],  and  Gaver  [17].  The 
idea  of  describing  cumulative  damage  as  a  stochastic 
process  can  be  traced,  to  the  best  of  our  knowl¬ 
edge,  to  Cox  [18,  p.  91],  and  to  the  Ph.D.  thesis  of 
Morey  [19].  However,  the  granddaddy  of  all  work  on 
damage  processes  is  the  remarkable  paper  of  Esary 
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et  al.  [20],  who  (without  articulating  what  damage 
means)  describe  damage  by  a  compound  Poisson 
process  with  increments  that  are  positive  and  have  the 
Markov  property.  Failure  occurs  when  the  said  pro¬ 
cess  hits  a  random  barrier  whose  distribution  is  expo¬ 
nential.  The  choice  of  an  exponential  distribution  for 
the  barrier  is  arbitrary,  and  Esary  et  al.  [20J  show  that 
the  hitting  time  has  an  exponential  distribution.  More 
recently  Zacks  [21,  22]  has  elaborated  on  this  theme. 

In  the  biostaristical  arena  there  is  a  setup  parallel 
to  that  of  Esary  et  al.  [20],  which  does  not  allude  to 
damage,  deterioration,  or  aging,  but  to  the  number  of 
mice  at  some  time  /,  that  have  typhoid  organisms. 
The  growth  of  such  mice  is  described  by  a  pure  birth 
process,  and  the  first  passage  time  to  a  barrier  is 
investigated.  The  specifics  are  in  Cox  and  Miller  [23, 
p.  160],  and  in  Cox  [11]. 

The  Esaiy  et  al.  [20]  architecture  is  enhanced  by 
Lemoine  and  Wenocur  [24],  who  model  wear  (i.e., 
damage)  by  a  suitable  random  process  but  who  also 
allow  for  failure  due  to  trauma.  The  latter  is  described 
by  a  Poisson  process,  the  rate  of  which  depends 
on  the  state  of  wear.  An  item  fails  when  the  wear 
reaches  a  threshold  or  when  the  item  experiences  fatal 
trauma.  Thus  in  the  model  of  Lemoine  and  Wenocur 
[24],  the  wear  and  the  trauma  processes  compete  with 
each  other  for  an  item’s  lifetime.  The  random  process 
considered  by  the  above  authors  is  a  diffusion  process 
that  is  driven  by  a  Wiener  process.  In  a  subsequent 
paper,  Lemoine  and  Wenocur  [25]  describe  wear  by 
a  shot-noise  process.  A  disadvantage  of  the  diffusion 
and  the  shot- noise  process  is  that  the  wear  (to  us 
damage)  is  not  monotonically  nondecrcasing.  To 
rectify  this  deficiency,  Wenocur  [26]  considers  a 
gamma  process  for  describing  wear.  His  development 
of  the  gamma  process  proceeds  along  the  following 
lines. 

Partition  the  time  interval  into  subintervals  of 
length  h,  and  let  X(n)  denote  the  damage  (or  wear) 

at  time  n ht  n  =  1 ,  2, _ Suppose  that  the  damage  at 

time  (n  +  \)h  is  prescribed  via  the  relationship 

X(n  -HI)  -  X(n)  =  ct(X(n))e„  +  P(X(n))h  (3) 

where  oc,  fi  are  constants,  and  }  is  a  sequence 
of  independent  and  identically  distributed  random 
variables  having  a  gamma  distribution  with  shape 
parameter  h  >  0.  Letting  h  |  0,  we  have 

dX(/)  =  a(X(t~ ))  dK(0  +  (*(/-))  (4) 


where  {/(/)}  is  a  gamma  process. 

In  integral  terms,  equation  (4)  becomes  the  stocha¬ 
stic  integral 


►r. 

Jo 


Jo 


X(t)  =  X(Q)  +  /  ct(X(s-))  d y(s)+l  (X(O)  ds 

(5) 

Since  the  gamma  process  has  nonnegative  incre¬ 
ments,  the  wear  (or  damage)  process  is  increasing. 
For  an  overview  of  the  gamma  process  and  their  con¬ 
structions,  see  Singpurwalla  [27],  or  van  dcr  Weid 
[28].  Whereas  a  gamma  process  model  may  be  attrac¬ 
tive  in  scenarios  wherein  the  damage  causing  shocks 
occur  frequently,  the  models  by  Zacks  [21,  22]  for 
the  compound  Poisson  process  case  and  for  the  com¬ 
pound  renewal  process  case,  respectively,  seem  to  be 
more  appropriate  when  the  shocks  are  infrequent. 


Candidate  Marker  Processes 

In  engineering  reliability,  an  archetypical  marker  pro¬ 
cess  is  crack  growth,  whereas  in  biostatisdcal  studies, 
it  appears  that  CD4  cell  counts  is  a  commonly  men¬ 
tioned  biomarker.  With  archetypical  markers  come 
archetypical  stochastic  processes  for  (Z(/);  /  >  0], 
and  one  such  process  is  the  Wiener  process  with  a 
drift  parameter  jj  and  a  diffusion  parameter  o2  >  0; 
see  Doksum  [3]  and  Whitmore  et  al.  [8).  As  men¬ 
tioned  before,  the  marker  is  often  viewed  as  a  proxy 
for  damage,  and  failure  is  said  to  occur  when  the 
marker  process  hits  a  threshold.  As  is  well  known, 
the  hitting  time  to  the  threshold  (assumed  fixed  and 
known)  of  a  Wiener  process  has  an  inverse  Gaussian 
distribution  (see  Singpurwalla  [13,  p.  68  and  136], 
for  a  discussion  of  this  distribution). 

The  Wiener  process  has  independent  increments, 
so  does  a  gamma  process.  This  amounts  to  saying  that 
the  increments  of  crack  size  are  independent  of  the 
existing  crack  length.  This  latter  phenomenon  is  not 
always  true.  The  bigger  the  crack,  the  bigger  is  its 
growth.  This  motivates  one  to  consider  transforma¬ 
tions  of  the  Wiener  process.  Furthermore,  the  crack 
growth  phenomenon  also  exhibits  abrupt  growth.  The 
Wiener  process  does  not  encapsulate  such  abruptness 
of  growth.  With  the  above  in  mind,  Schabe  [29]  pro¬ 
poses  the  following  as  a  model  for  X (/),  the  size  of 
a  crack  at  time  /,/  >  0.  Let  X(f)  =  (Af(r))°»  where 

W(t)+fiP(t)  (6) 

Here  W(t)  is  a  Wiener  process  with  variance 
or2/,  and  P(t )  is  a  Poisson  process  with  intensity 
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X.  The  constant  b  >  0  describes  a  trend,  and  the 
constant  a  is  such  that  a  >  (<)1  encapsulates  a 
progressive  (regressive)  velocity  with  which  the  crack 
grows.  Under  the  model  of  equation  (6),  Schabe 
[29]  obtains  the  hitting  time  of  X(t)  to  h  >  0,  a 
bamer.  This  distribution  does  not  have  a  closed-form 
solution.  However,  the  mean  and  the  variance  of  this 
distribution  are  available;  these  are  hhfa(b  +  /aX)  and 
hl/a  (a2  4-  X/a2)  /(b  4-  /aX)3,  respectively. 

In  Ebrahimi  [6J,  a  strategy  that  parallels  those  of 
Lemoinc  and  Wenocur  [24J  and  of  Schabe  [29]  is 
taken,  but  instead  of  looking  at  the  growth  of  a  single 
crack,  an  ensemble  of  k  cracks,  each  having  its  own 
growth  rate  is  considered  Specifically,  it  is  supposed 
that  the  growth  of  the  ith  crack,  i  =  1  is 

governed  by  the  stochastic  differential  equation 

dX,(0  =  X, (/)*,(/)  +OXM  dW(t)  (7) 

where  X*(f)  is  the  growth  rate  of  the  cracks,  a  >  0  is 
a  constant,  and  { W(/);  /  >  0}  is  a  standard  Wiener 
process  with  mean  0  and  variance  t.  Using  standard 
results  (i.e.,  the  Ito  formula)  it  can  be  seen  that 

Xi(t)  =  X(0)  exp  [*(,)  -  y/  +  a  W(/)  j  (8) 

where  A,-(/)  =  /J  X,-(s)  dj,  and  X(0)  is  the  initial 
crack  size,  assumed  to  be  known,  and  is  the  same 
for  all  the  k  cracks.  The  item  fails  when  the  size 
of  the  largest  crack  hits  some  threshold,  say  a.  If  T 
denotes  the  passage  time  of  the  largest  crack  to  the 
threshold  ay  then  it  can  be  seen  that  for  0  <  u  <  t 


P(T  >  t ) 

=  P  [w(u)  <  (min  (|u  -  i-A,(«))  +c)] 

(9) 

where  c  =  b/a  and  b  =  log  a  —  log  X  (0). 

Computation  of  the  above  follows  from  results  on 
the  the  times  taken  by  the  Weiner  process  lo  hit  a 
threshold. 

Motivated  by  a  model  of  Durham  and  Padgett 
[30],  Park  and  Padgett  [31]  propose  a  model  for 
cumulative  damage,  assuming  that  the  damage  is 
an  observable  measurable  entity.  This  amounts  to 
interpreting  cumulative  damage  as  a  marker,  like 
crack  growth  The  scheme  proposed  by  Park  and 
Padgett  [31]  is  noteworthy,  because  it  facilitates  the 
introduction  of  both  a  Brownian  motion  process  and 


a  gamma  process  as  the  driving  processes  for  crack 
growth.  Here,  for  some  functions  c(.)  and  fi(.),  it  is 
assumed  that 


dc(X(t))=h(X(t))dD(t)  (10) 

where  D(t)  is  the  damage  at  f,  and  X(f)  is  the 
cumulative  damage  at  f.  As  a  consequence  of  the 
above 

f  1777  »  dcW«))  =  f  d°(“)  =  D<f)  ~  P<°) 
J0  h(X(u))  J0 

(11) 

By  choosing  various  forms  for  the  function  c(.) 
and  fi(.)*  and  a  stochastic  process  for  {£>(/);  t  >  0), 
different  models  for  X  (/)  can  be  attained.  For 
example,  with  h(u)  =  l,c(u)  =  log«,  and  a  Brow¬ 
nian  motion  (or  Wiener  process)  for  {D(0;  t  >  0), 
we  obtain  a  geometric  Brownian  motion  process 
for  X(r).  With  h(u )  —  1,  c(«)  —  u  and  a  Brownian 
motion  process  for  {D(r);  /  >  0}  we  obtain  a  Gaus¬ 
sian  process  for  X(/).  With  h(u)  =  1,  c(u)  =  u  and  a 
gamma  process  for  {£>(/);  t  >  0}  we  obtain  a  gamma 
process  for  X (/).  Whereas  the  Gaussian  process  is  not 
always  positive,  the  geometric  Brownian  motion  pro¬ 
cess  is  always  positive  but  not  increasing.  In  contrast, 
the  gamma  process  is  both  positive  and  increasing. 
Characteristics  of  the  hitting  times  of  the  geomet¬ 
ric  Brownian  motion  and  the  gamma  process  to  a 
fixed  and  known  threshold  arc  also  obtained  by  Park 
and  Padgett  [31].  This  completes  our  overview  of 
stochastic  process  models  for  the  damage  process  and 
the  marker  process  -  when  viewed  separately  -  save 
for  the  work  of  Desmond  [32],  who  articulates  on  a 
two-parameter  family  of  life  distributions  introduced 
by  Bimbaum  and  Saunders  [33].  This  distribution  is 
motivated  via  the  notion  that  failure  caused  by  fatigue 
is  due  to  the  initiation,  growth,  and  extension  of  a 
dominant  crack  past  some  critical  length. 

The  essence  of  Desmond’s  [32]  idea  is  based  on 
the  notion  that  it  is  the  environmental  stresses  called 
impulses  that  cause  a  crack  to  grow,  so  that  if  X,-  is 
the  size  of  the  crack  after  the  ith  impulse,  then 


x,+1  =  X,  +  n/+1g(X<),i=0,  1,2,...  (12) 

here  I"h  is  taken  to  be  the  magnitude  of  the  ith 
impulse;  e.g.  the  stress  caused  by  the  ith  impulse.  The 
n/s  are  assumed  to  be  random.  If  VX(-  =  X/+1  —  X{ 
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is  taken  to  be  sufficiently  small,  then 

n.  *  [*'  -j-7  (13) 

Jx o  sOO 

is  approximately  normal;  g(y)  is  some  function  of  y. 
The  quantity  Xq  is  the  initial  size  of  the  dominant 
crack. 

With  g(>0  =  1,  and  assuming  that  the  n/s  have 
a  common  mean  fx  and  variance  a2 

/(X(D  =  /  -rr  ~  N(tn,  ra2)  (14) 

Jx o  sty) 

If  Xc  denotes  the  critical  crack  size,  and  T  the 
time  to  failure  of  the  unit  experiencing  the  impulses, 
then 

T  =inf{f:X(/)  >  Xe)  (15) 

Simple  manipulations  show  that 

P(T  </)  =  *  (f-~^*c))  06) 

Where  <t>(.)  is  the  unit  normal  distribution  func¬ 
tion.  The  distribution  function  given  above  is  a 
member  of  the  Bimbaum- Saunders  [33]  family  of 
distributions. 
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The  Hazard  Potential:  Introduction  and  Overview 
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This  is  an  expository  article  directed  at  reliability  theorists,  survival  analysts,  and  others  interested  in  looking  at  life  history  and  event 
data.  Here  we  introduce  the  notion  of  a  hazard  potential  as  an  unknown  resource  that  an  item  is  endowed  with  at  inception.  The  item  fails 
when  this  resource  becomes  depleted.  The  cumulative  hazard  is  a  proxy  for  the  amount  of  resource  consumed,  and  the  hazard  function 
is  a  proxy  for  the  rate  at  which  this  resource  is  consumed.  With  this  conceptualization  of  the  failure  process,  we  are  able  to  characterize 
accelerated,  decelerated,  and  normal  tests  and  are  also  able  to  provide  a  perspective  on  the  cause  of  interdependent  lifetimes.  Specifically, 
we  show  that  dependent  life  lengths  are  the  result  of  dependent  hazard  potentials.  Consequently,  we  are  able  to  generate  new  families  of 
multivariate  life  distributions  using  dependent  hazard  potentials  as  a  seed  For  an  item  that  operates  in  a  dynamic  environment,  we  argue 
thai  its  lifetime  Is  the  killing  time  of  a  continuously  increasing  stochastic  process  by  a  random  barrier,  and  this  barrier  is  the  item’s  hazard 
potential.  The  killing  time  perspective  enables  us  to  see  competing  risks  from  a  process  standpoint  and  to  propose  a  framework  for  the  joint 
modeling  of  degradation  or  cumulative  damage  and  its  markers.  The  notion  of  the  hazard  potential  generalizes  to  the  multivariate  case.  This 
generalization  enables  us  to  replace  a  collection  of  dependent  random  variables  by  a  collection  of  independent  exponentially  distributed 
random  variables,  each  having  a  different  time  scale. 
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1.  INTRODUCTION  AND  OVERVIEW 

1.1  Preliminaries:  The  Hazard  Rate  and 
the  Hazard  Potential 

The  mathematical  theory  of  reliability,  the  statistical  theory 
of  life  history  or  survival  analysis,  and  the  underlying  premise 
of  actuarial  sciences  are  driven  by  a  notion  unique  to  them, 
the  hazard  rate  function  (see,  e  g.,  Gjessing,  Aalen,  and  Hjort 
2003).  The  hazard  rate  function  is  both  a  theoretical  and  a  de¬ 
scriptive  tool  that  also  plays  a  fundamental  role  in  event  history 
analysis.  Specifically,  there  is  a  parallel  between  the  hazard  rate 
function  and  the  intensity  function  of  a  nonhomogeneous  Pois¬ 
son  process  (see  Grandell  1975),  and  also  between  the  intensity 
function  pf  a  doubly  stochastic  Poisson  process  and  the  hazard 
rate  function  when  the  latter  is  viewed  as  a  stochastic  process 
(see  Kebir  1991).  There  are  two  virtues  of  the  hazard  function: 
(a)  an  interpretive  content,  in  the  sense  that  the  aging  charac¬ 
teristics  of  single  and  one-of-a-kind  items  can  be  encapsulated 
by  the  shape  of  the  hazard  function,  and  (b)  that  under  some 
regularity  conditions  (see  Yashin  and  Arjas  1988;  Singpurwalla 
and  Wilson  1995),  the  hazard  function  uniquely  determines  a 
survival  function.  There  are  other  scenarios  in  which  (a)  is  also 
germane;  these  have  been  alluded  to  by  Gjessing  et  al.  (2003); 
some  examples  are  an  understanding  of  neuronal  degeneration, 
the  sleep-wake  cycles  of  individuals,  and  the  longevity  of  hu 
mans  (see  Gavrilov  and  Gavrilova  2001). 

This  is  an  expository  article  directed  at  reliability  theorists, 
survival  analysts,  actuaries,  and  others  interested  in  event  his¬ 
tory  analysis.  Our  purpose  here  is  to  introduce  a  new  notion,  the 
hazard  potential  (HP)  as  a  conceptual  tool  that  provides  a  differ¬ 
ent  way  of  looking  at  the  stochastic  behavior  of  lifetimes.  The 
term  “potential”  refers  to  a  feature  parallel  to  that  of  potential 
energy  in  physics.  The  difference  here  is  that  wc  are  alluding  to 
an  item's  resistance  to  failure  rather  than  its  capacity  for  work. 
In  Section  3  of  this  article  we  put  forth  the  view  that  the  HP  can 
be  interpreted  as  the  (random)  amount  of  an  unknown  resource 
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with  which  an  item  is  endowed  at  inception,  and  that  the  item 
fails  when  this  resource  is  depleted.  Looking  at  lifetime  in  terms 
of  a  depleting  resource  can  be  more  satisfying  than  one  based 
on  conditional  probabilities,  which  is  what  the  hazard  function 
represents. 

Besides  providing  an  alternative  platform  for  conceptualizing 
the  process  that  leads  to  failure,  and  for  processes  that  compete 
for  failure,  the  HP  has  the  following  attractive  features: 

•  It  is  inherently  robust,  in  the  sense  that  the  HP  of  any  and 
all  items  has  an  exponential  (1)  distribution  on  a  suitably 
chosen  time  scale. 

•  It  provides  a  context-free  means  for  characterizing  accel¬ 
erated,  decelerated,  normal,  and  partially  accelerated  life 
tests. 

•  In  the  language  of  probabilistic  causality  (see  Suppes 
1970),  it  can  be  seen  as  either  a  prima  facie  or  a  genuine 
cause  of  dependence  between  lifetimes. 

•  It  provides  a  vehicle  for  developing  new  families  of  uni¬ 
variate  and  multivariate  survival  functions  by  looking  at 
the  killing  times  of  continuously  increasing  stochastic 
processes  to  random  barriers. 

•  It  offers  a  natural  platform  from  which  the  abstract  phe¬ 
nomenon  of  degradation  (or  damage  accumulation)  and  its 
markers  can  be  stochastically  described. 

The  IIP  generalizes  to  the  multivariate  case.  This  general¬ 
ization,  when  used  in  conjunction  with  the  notion  of  a  hazard 
gradient  due  to  Marshall  (1975a),  enables  us  to  represent  a  col¬ 
lection  of  dependent  lifetimes  in  terms  of  a  collection  of  inde¬ 
pendent  exponential  (1)  random  variables,  each  on  a  different 
time  scale. 

Iu  light  of  the  foregoing  features,  we  may  liken  the  HP  to  the 
notion  of  a  hidden  parameter  in  physics.  Hidden  parameters  per 
sc  do  not  have  a  physical  reality,  but  nonetheless  arc  valuable 
because  they  provide  explanations  lor  observable  phenomena. 

1.2  Overview 

This  article  is  organized  as  follows.  In  Section  2  we  introduce 
our  notation  and  review  some  basic  relationships.  In  Section  3 
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we  define  the  HP  and  interpret  its  nature  from  both  physical  and 
probabilistic  standpoints.  We  also  provide  a  way  to  formally 
distinguish  between  accelerated,  decelerated,  normal,  and  par¬ 
tially  accelerated  life  tests  from  a  context-free  standpoint.  The 
state  of  the  art  in  accelerated  testing  seems  vague  when  it  comes 
to  being  specific  about  what  a  normal  life  test  means;  it  treats 
this  matter  as  a  given.  We  conclude  Section  3  by  generaliz¬ 
ing  the  HP  to  the  nonexponential  case  through  the  notion  of  a 
G-hazard  potential.  In  Section  4  we  present  several  qualitative 
results  pertaining  to  the  claim  that  dependent  HPs  are  a  prima 
facie  cause  of  dependent  lifetimes,  whereas  a  common  HP  is 
a  genuine  cause  of  dependence.  Dependent  HPs  are  a  manifes¬ 
tation  of  commonalities  in  manufacturing  or,  in  the  context  of 
biological  units,  a  shared  genetic  makeup.  In  Section  5  we  put 
the  material  of  Section  4  to  work  by  generating  new  families 
of  dependent  lifetimes  using  dependent  HPs  as  a  seed.  In  Sec¬ 
tion  6  we  develop  new  families  of  survival  functions  for  items 
destined  to  operate  in  random  environments.  The  material  here 
revolves  around  obtaining  the  distribution  of  the  killing  time  of 
a  continuously  increasing  stochastic  process  by  a  random  bar¬ 
rier  that  is  an  item’s  HP.  Although  the  approach  of  Section  6 
is  general,  attention  focuses  only  on  the  following  processes: 
the  running  maxima  of  a  Brownian  motion,  a  Markov  process 
with  nonnegative  increments,  a  family  of  nonnegative  L6vy 
processes,  and  the  integral  of  a  geometric  Brownian  motion. 
The  material  of  Sections  5  and  6  is  not  purely  conceptual;  it  has 
the  attractiveness  of  having  a  practical  import.  This  can  be  seen 
as  an  argument  in  favor  of  looking  at  the  HP  as  a  useful  tool.  In 
Section  7  we  explore  the  role  of  the  HP  in  articulating  the  no¬ 
tion  of  competing-risk  processes  and  casting  the  phenomenon 
of  degradation  and  its  markers  in  a  manner  that  accords  with 
that  described  in  the  engineering  and  materials  science  litera¬ 
ture.  We  devote  Section  8  to  the  multivariate  case,  which  entails 
a  relationship  between  the  hazard  gradient  and  what  we  intro¬ 
duce  as  a  conditional  HP.  This  connection  allows  us  to  replace 
a  collection  of  dependent  lifetimes  by  a  collection  of  indepen¬ 
dent  exponential  (1)  lifetimes,  each  indexed  by  a  different  time 
scale.  In  Section  9  we  close  the  article  by  reemphasizing  the 
point  of  view  that  the  HP  offers  an  alternative  perspective  for 
appreciating  the  failure  process  and  that  it  is  a  useful  conceptual 
tool  for  understanding  the  cause  of  interdependent  lifetimes  in 
engineering  and  biological  systems.  We  close  Section  9  by  ex¬ 
pressing  our  hope  that  the  role  of  the  HP  could  turn  out  to  be  as 
useful  to  reliability  and  survival  analysis  as  the  failure  rate  and 
the  intensity  functions. 

2.  NOTATION,  TERMINOLOGY,  AND 
PRELIMINARY  RELATIONSHIPS 

Let  T  denote  the  (unknown)  time  to  failure  of  a  unit  that  is 
scheduled  to  operate  in  some  environment,  labeled  £.  Based  on 
the  characteristics  of  the  unit,  and  on  knowledge  of  how  the 
unit  interacts  with  £  (vis-a-vis  7),  one  is  able  to  subjectively 
specify  h(t),  t  >  0,  the  Jiazard  rate  function  of  P(T  >  t ),  the 
survival  function  of  7\  assumed  to  be  absolutely  continuous. 
We  interpret  h{t)  through  the  relationship 

h(i)  dt^P(t<T  <t  +  dt\T  >  /), 

where  the  right  side  is  a  conditional  probability.  A  formal  de¬ 
finition  of  h(t)  can  be  found  in  the  recent  book  of  Aven  and 
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Jensen  (1999).  We  claim  that  the  hazard  function  is  a  theoreti¬ 
cal  (or  abslract)  notion  because,  unlike  lifetimes  that  can  be  di¬ 
rectly  observed,  conditional  probabilities  are  either  subjectively 
specified  or  inferred  from  data. 

Let  H(t)  =  Jq  h(u)  du\  II (/)  is  known  as  the  cumulative  haz¬ 
ard  at  time  t  Observe  that  H(t)  is  nondecreasing  in  t.  But 
what  does  H(t)  mean?  Whereas  h{t)dt  can  be  given  an  in¬ 
tuitive  import,  H(t)  cannot!  It  is  not  the  sum  of  conditional 
probabilities — because  the  conditioning  event  changes  with  t — 
and  there  is  no  law  of  probability  that  leads  us  to  H(t).  Thus 
II  (t)  does  not  have  a  probabilistic  connotation.  Yet  H(t)  plays 
a  key  role  in  reliability  and  survival  analysis,  because  of  the  ex¬ 
ponentiation  formula  (see  Barlow  and  Proschan  1975,  p.  53), 
which  says  that  with  H(t)  specified, 

P(T>t\H(t),t>0)  =e~H{,).  (1) 

In  the  foregoing  equation,  plus  those  that  follow,  we  intro¬ 
duce  the  convention  that  all  quantities  to  the  right  of  the  semi¬ 
colon  arc  viewed  as  being  specified.  In  contrast,  all  quantities 
to  the  right  of  the  vertical  slash  are  conditional,  that  is,  if  they 
are  known. 

Equation  (1)  relates  the  survival  function  P(T  >  t)  to  H(t)  ; 
however,  H(t)  lacks  an  interpretive  content.  Our  interest  in  this 
article  is  motivated  by  the  desire  to  interpret  H(t)  in  a  manner 
that  provides  insight  into  the  relationship  of  (1 ). 

In  the  case  of  a  one-of-a-kind  item,  h(t)dt  encapsulates  an 
assessor’s  judgment  about  the  inherent  quality  of  an  item  and 
the  environment  in  which  it  operates.  By  quality,  we  mean 
a  resistance  to  failure-causing  agents,  such  as  crack  growth, 
weakening  of  the  immune  system,  and  so  on.  Consequently,  the 
hazard  rate  of  an  item  of  poor  quality  that  operates  in  a  be¬ 
nign  environment  could  be  smaller  than  that  of  a  high-quality 
item  that  operates  in  a  harsh  environment.  In  effect,  the  quantity 
h(t)  dt  encapsulates  an  assessor’s  subjective  view  of  the  manner 
in  which  an  item  and  its  environment  interact.  Thus,  in  princi¬ 
ple,  h(t)  dt  does  not  have  a  physical  reality. 

Turning  attention  to  the  right  side  of  (1),  we  note  that  e~H ® 
is  the  survival  function  of  an  exponentially  distributed  random 
variable,  say  X,  if  its  scale  parameter  is  1,  evaluated  at  //(/), 
that  is, 

P(T  >  t;  H(t),  t>  0)  =  =  P(X  >  H(t) |1).  (2) 

3.  INTERPRETATION:  THE  NOTION  OF 
A  HAZARD  POTENTIAL 

Thus  far,  we  have  introduced  three  quantities,  X,  //,  and  T. 
Given  any  two  of  these,  we  can  find  the  third  using  (2).  But  what 
insight  can  (2)  provide  about  H(t)  and  X?  We  see  two  possibil¬ 
ities,  one  providing  an  indifference  principle  for  reliability  and 
survival  analysis  and  the  other  having  a  physical  connotation. 

To  appreciate  the  first,  we  see  from  (2)  that,  corresponding  to 
every  nonnegative  random  variable  T  having  an  absolutely  con¬ 
tinuous  survival  function  F(t)  =  P(T  >  /),  there  exists  a  random 
variable  X  taking  values  //(/),  0  <  H(t)  <  oo,  whose  survival 
function  is  an  exponential  with  a  scale  parameter  of  1.  The  sur¬ 
vival  function  of  T  is  indexed  by  r,  t  >  0,  whereas  that  of  X 
is  indexed  by  H(t)  =  —  f^dF{ii)/F{u).  We  can  summarize  the 
foregoing  in  the  following  theorem. 
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Theorem  1.  The  lifetime  of  any  and  all  items  has  an  expo¬ 
nential  (1)  distribution  on  H(t),  the  cumulative  hazard,  as  the 
scale. 

The  essence  of  Theorem  1  has  been  noted  by  Cinlar  and 
Ozekici  (1987);  it  is  stated  here  as  a  prelude  to  Theorem  5, 
which  pertains  to  the  multivariable  case.  In  the  context  of  point 
processes,  Theorem  1  has  a  parallel  with  the  result  that  any  non- 
homogeneous  Poisson  process  can  be  transformed  by  a  change 
in  clock  time  to  a  homogeneous  Poisson  process  with  rate  one 
(see  Kingman  1964)  This  parallel  leads  us  to  make  precise  the 
notions  of  accelerated  and  normal  life  tests  in  Section  3.1. 

3.1  The  Physical  Connotation 

To  appreciate  the  physical  connotation  implied  by  (2),  we 
note  that  because 

P(T  <  t\  H(t)  ft  >  0)  =  P(X  <  H(t)  1 1 ) , 

we -may  claim  that  the  time  to  failure  T  of  an  item  coincides 
with  the  time  at  which  the  cumulative  hazard  H(t)  crosses  a 
random  threshold  X ,  where  X  has  an  exponential  (1)  distribu¬ 
tion  (Fig.  1),  that  is,  T  =H~}  (X). 

The  random  threshold  X,  where  X  =  //  (T) ,  is  defined  as  the 
HP  of  the  item.  Furthermore,  because  the  exponential  (1)  dis¬ 
tribution  of  X  does  not  depend  on  £,  we  may  interpret  X  as 
an  unknown  resource  with  which  the  item  is  endowed  at  the 
time  of  its  inception.  With  X  considered  a  resource,  H(t)  can  be 
interpreted  as  the  amount  of  resource  consumed  by  time  /.  Con¬ 
sequently,  the  hazard  rate,  h(t)  =  //(/),  can  be  considered  the 

rate  at  which  the  resource  is  consumed.  With  this  alternative 
perspective  on  // (/)  and  h(t)y  wc  may  view  a  normal  life- test 
as  one  for  which  H(t)  =  /,  a  uniformly  accelerated  (deceler¬ 
ated)  test  as  one  for  which  H(t)  >  (<)  t>  and  a  partially  accel¬ 
erated  (decelerated)  test  as  one  for  which  //(/)  crosses  t  from 
above  (below).  The  qualifier  accelerated  (decelerated)  signals  a 
contraction  (expansion)  of  the  clock  time  from  t  to  //(/),  and 
by  shifting  attention  from  the  applied  stress  (which  is  what  is 
normally  done  when  discussing  accelerated  tests)  to  time,  we 
achieve  the  context-free  feature  mentioned  earlier.  The  concept 


of  looking  at  failure  as  the  depletion  of  a  resource  dates  back  to 
a  Soviet  physicist  Sedyakin  (1966),  who  enunciated  this  view¬ 
point  without  a  formal  architecture. 

It  is  useful  to  note  that  the  exponential  (1 )  random  variable  X 
has  an  entropy  of  1,  and  also  the  lack  of  memory  property  if  and 
only  if  H(t)  —  t.  A  change  in  clock  time  from  t  to  H(t)  changes 
the  entropy  and  destroys  the  mcmoryless  property. 

3.2  The  G-Hazard  Potential 

There  is  a  generalization  of  Theorem  1  such  that  the  HP 
can  be  made  to  have  a  distribution  other  than  an  exponen¬ 
tial  (1).  Specifically,  suppose  that  G  is  some  absolutely  contin¬ 
uous  distribution  function  with  support  [0,  oo);  let  W  =  G_1. 
Then  it  can  be  seen  (Bagdonavicius  and  Nikulin  1999)  that 
Y  =f  WfFXr))  has  the  survival  function  G,  irrespective  of  8. 
Consequently; 

P{T  >/)  =  P(W(F(T))  >  W(F(r)))  =  P(Y  >  W(e-H(t))), 
so  that 

P(T  <  0  =  P(V  <  (3) 

Equation  (3)  implies  that  the  item  fails  when  W(e~H W),  ex- 
ceeds  a  threshold  7,  where  Y  has  the  distribution  G.  We  refer 
to  Y  as  the  G- hazard  potential  and  W(e~H^)  as  the  G -resource 
used  until  time  /.  Then  we  have,  as  a  generalization  of  Theo¬ 
rem  1,  the  following  result. 

Theorem  2.  The  lifetime  of  any  item  can  be  made  to  have 
any  absolutely  continuous  survival  function  G,  provided  that 
G  is  indexed  by  G-1(exp(— H(t))). 

As  of  now,  Theorem  2  is  mainly  of  an  academic  interest;  it  is 
given  here  for  completeness. 

4.  HAZARD  POTENTIALS  AND 
DEPENDENT  LIFETIMES 

The  aim  of  this  section  is  to  discuss  the  nature  of  depen¬ 
dence  between  lifetimes  and  offer  a  new  perspective  on  the 
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Figure  1.  Relationship  Between  Cumulative  Hazard,  Threshold  X,  and  Failure  Time. 
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cause  of  interdependence.  We  argue  that  the  HP  offers  a  con¬ 
venient  platform  for  doing  this.  We  view  dependence  and  inde¬ 
pendence  from  a  subjectivistic  (de  Finettian)  viewpoint;  that  is, 
two  events  A  and  B  are  dependent  if  knowledge  about  B  causes 
us  to  change  our  two-sided  bets  on  A. 

Because  H(t)  encapsulates  an  assessor's  view  about  the  in¬ 
teraction  between  an  item’s  quality  and  its  environment,  it  is 
likely  that  two  different  items  operating  in  a  common  environ¬ 
ment  will  have  different  //(f) ’s,  say  H\(t)  and  H2(t).  Similarly, 
for  a  single  item,  changing  its  environment  from  £\  to  £2  will 
change  its  cumulative  hazard  from  H\(t)  to  //2(f)  (Fig.  2). 

Figure  2  suggests  that  the  lifetimes  7j  and  T2  of  two  items 
having  the  same  hazard  potential  will  be  dependent.  Equiva¬ 
lently,  the  lifetimes  7^  and  of  a  single  item  scheduled  to 
operate  in  two  environments,  £\  and  £ 2 ,  will  also  be  depen¬ 
dent.  However,  from  a  subjectivistic  perspective,  the  depen¬ 
dence  will  come  into  play  only  when  one  is  able  to  specify 
H 1  (f)  and  //2(f),  or  a  relationship  between  the  two,  when  only 
one  of  them  is  known.  This  is  because  knowledge  of,  say,  T\  to¬ 
gether  with  H\(t)  will  tell  us  something  about  the  unknown  X\y 
and  if  Xi  and  X2  are  dependent,  then  knowledge  of  Xt  will  en¬ 
lighten  us  about  X2.  Consequently,  X2  together  with  H2(t)  will 
help  change  our  assessment  of  T2.  To  summarize,  if  the  HPs  X\ 
and  X2  are  dependent,  then  the  lifetimes  7j  and  T2  will  also  be 
dependent,  provided  that//j(r)  and  //2(f)  are  known  or  a  rela¬ 
tionship  between  them  is  specified.  In  contrast,  if  Xj  and  X2  are 
independent,  then  so  are  T\  and  T2y  irrespective  of  whether  or 
not  H\(t)  and  //2(f)  are  known.  These  assertions  are  summa¬ 
rized  in  the  remarks  that  follow. 

Remark  1 .  When  H\(t)  and  H2 (f)»  t  >  0,  are  known,  life¬ 
times  T 1  and  T2  are  independent  if  and  only  if  their  hazard  po¬ 
tentials,  Xi  and  X2,  are  independent. 

Proof.  When  Xi  and  X2  are  independent, 

P(Xi>Ih(h),X2>H2(t2)) 

=  P(X\  >H\(ti))  ■  P(X2  >  tf2(r2)), 
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for  any  H\  (fj)  and  H2(t2).  Consequently, 
P{Ti>tl,T2>l2-,Hl(t),H2(t),t>0) 

=  P(Xl>Hi(ti),X2>H2(t2)) 

=  P(Xl>Hi(tl)).P(X2>H2(t2)) 

—  P(T\  >ti\Hi(t)yt>0)  P(T2>t2;H2(t)yt>0). 

Thus,  knowing  H\  ( t )  and H2{t)y  7j  and  T2  are  independent,  and 
similarly  for  the  converse. 

When  Hi{t)y  1  =  1,2  or  both  i  =  1  and  2,  for  t  >  0  are  not 
known,  Remark  1  is  weakened  in  the  sense  that  only  the  “if” 
part  holds.  Specifically,  7j  and  T2  are  independent  even  when 
X]  and  X2  are  dependent.  The  subjectivistic  line  of  reasoning 
justifying  this  claim  goes  as  follows. 

Observing  7j  provides  no  insight  about  X\y  because  H\(f) 
is  not  known.  Consequently,  there  also  is  no  insight  into 
X2  or  T2.  Thus  7j  and  T2  are  independent  Mathematically, 
without  knowing  H\(t)y  i  =  1,2,  we  are  unable  to  relate 
P(Ty  >t\yT2>  t2)  with  the  distribution  of  Xi  and  X2.  We  sum¬ 
marize  the  foregoing  in  the  following  remark. 

Remark  2.  Lifetimes  T\  and  T2  are  independent  whenever 
H\(t)  and  (or)  H2(t)y  t  >  0,  are  not  known. 

As  a  consequence  of  Remarks  1  and  2,  we  may  state  the  fol¬ 
lowing  theorem. 

Theorem  3.  Lifetimes  7j  and  T2  are  dependent  if  and  only 
if  their  hazard  potentials  Xi  and  X2  are  dependent  and  if  H\{t) 
and  //2(0  arc  known. 

Theorem  3  puts  aside  the  often  expressed  view  that  the  life¬ 
times  of  items  sharing  a  common  environment  are  necessar¬ 
ily  dependent  (see  Marshall  1975b,  Lindley  and  Singpurwalla 
1986);  that  is,  it  is  a  common  environment  that  causes  depen¬ 
dence  among  lifetimes.  Theorem  3  asserts  that  it  is  the  com¬ 
monalities  in  the  HPs  or  identical  HPs,  both  of  which  result 
in  dependent  HPs,  that  cause  of  interdependent  lifetimes.  De¬ 
pendent  HPs  are  a  manifestation  of  similarities  in  design,  man¬ 
ufacture,  or  genetic  makeup  In  the  language  of  probabilistic 


Cumulative  Hazard,  H(t) 


Figure  2.  Effect  of  Changing  Environment  on  Lifetimes. 
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causality  of  Suppes  (1970),  the  common  environment  can  be 
interpreted  as  a  spurious  cause  of  dependent  lifetimes,  whereas 
dependent  (or  identical)  HPs  are  their  prima  facie  (or  genuine) 
cause. 

The  role  of  Theorem  3  is  to  generate  new  families  of  depen¬ 
dent  lifetimes  using  multivariate  distributions  with  exponeniial 
marginals  as  a  seed;  see  Section  5.  Remarks  1  and  2  pertain  to 
the  two  extreme  cases  in  which  the  Hj(t)  ’s,  i  =  1 , 2,  are  either 
known  or  not.  An  intermediate  case  is  one  in  which  an  //*(*), 
say  H\(t),  t  >  0,  is  known  and  the  other  is  not,  except  for  the 
fact  that  H\  ( t )  >  7/2(0-  For  such  scenarios,  we  have  the  follow¬ 
ing. 

Remark  3.  Suppose  that  H{(t)  >  (<)  H2O)  and  that  either 
H\(t)  or  H2O),  t>  0,  is  known;  then  X\  and  X2  dependent  im¬ 
plies  that  T\  and  T2  are  also  dependent. 

Proof.  The  proof  is  by  contradiction.  For  this,  suppose  that 
X\  and  Y2  have  the  Bivariate  Exponential  Distribution  (BVE)  of 
Marshall  and  Olkin  (1967);  specifically,  for  X) ,  X2,  and  X i2  >0, 

P(X  1  >  x,X2  >  y)  =  exp(— Xjx-  X2 y  -  Xi2max(x,y)), 

=  exp(-(Ai  +  Xn)x-  X2y),  ifx>y. 

The  marginal  distribution  of  X^  P(Xi  >  x)  =  exp(— (A/+A12W, 
1  =  1,2.  For  the  Xfs  to  be  dependent  HPs,  we  need  to  have 
(X;  +  X 12)  =  1 ,  for  i  =  1,2,  and  Xj2  >  0;  this  would  imply  that 
Xl  =  X2  —  X  Thus 

P(X  1  >x,X2  >y)  =  exp(— (x  +  X2y))  . 

If  wc  set  x  —  // 1  (ri)  and  y  =  //2(r2),  for  some  ,  /2  >  0,  then 
x  >  y  would  imply  that  //2 (^2)  =  W’i  (*2)  —  <5,  for  some  unknown 
8  >  0.  Consequently, 

P(Xi  >x,X2>y)  =  P{X  1  >  //,  (/i ) ,  X2  >  //,  (/2)  -  5) 

=  exp(-(W,(/i)  +  k2(Hdt2)  -  8))).  (4) 

Given  the  foregoing,  we  need  to  show  that  T\  and  T2  are  depen¬ 
dent.  Suppose  that  they  are  not;  then 

P{T\  >  h .  r2  >  t2;  Hx  (/,).  H2(t2),t\ .  t2  >  0) 

—  P(T\  >t\;HiQi),  t\  >  0 )P(T2  >  t2;H2(t2),  t2  >  0) 

=  P(Xy  >Hdty))P(X2>H2(t2)) 

-  exp(-W ](<]))  exp(- (//, (t2)  -  8)) 

=  P(Xj>Nl(t1),X2>H1(t2)-S),  (5) 

because  the  first  term  of  (5)  does  not  entail  elements  of  the  sec¬ 
ond  term.  Thus  we  have 

p(Xi>ffi(n),X2>ffi(t2)-s) 

=  exp(-(ffi  (ti)+ffi(t2)-8)).  (6) 

Equation  (6)  agrees  with  (4)  if  X2  =  1.  However,  X2  =  1  implies 
that  X12  =  0,  which  contradicts  the  hypothesis  that  X\  and  X2 
are  dependent.  The  proof  when  H\  (/)  <  H2  (f)  follows  along 
similar  lines. 

A  broader,  but  weaker  version  of  Remark  1  pertains  to  the 
case  where  X\  and  X2  are  exchangeable.  Here  again,  we  re¬ 
quire  that  77,(0,  i  =  1, 2,  t  >  0,  be  specified.  Wc  then  have  the 
following  result. 


Remark  4.  If  the  hazard  potentials  X\  and  X2  are  exchange¬ 
able  and  if  H[(t)yH2(t),t  >  0,  are  known,  then  the  lifetimes 
7)  and  T2  are  also  exchangeable. 

Proof.  Let  x  =  H\  (f)  and  y  =  7/2(0  for  any  t\ ,  t2  >  0;  then 

P{X  1  >x.X2>y)  =  P(TX  >ti,T2>  t2\  H\ (/, ),  H2(t2)) . 

Similarly, 

P(Xi  >y,X2>x)  =  P{T\  >  t2,  T2  >  /, ;  W,  (r() ,  H2{t2)). 

Because  the  exchangeability  of  X\  and  X2  implies  that 

P{Xx  >  x,  X2  >  y)  =  P(Xx  >  y,  Y2  >  x) , 

the  statement  of  the  remark  now  follows. 

5.  GENERATING  NEW  FAMILIES  OF 
DEPENDENT  LIFETIMES 

The  aim  of  this  section  is  to  put  Theorem  3  to  work.  Here  we 
show  how  dependent  HPs  can  be  used  to  generate  new  families 
of  multivariate  distributions  through  multivariate  distributions 
with  unit  exponentials  as  a  seed.  Of  course,  this  is  by  no  means 
the  only  way  to  generate  multivariate  distributions.  For  the  pur¬ 
pose  of  illustration,  we  limit  attention  to  the  bivariate  case  and 
consider  as  seeds  the  bivariate  exponentials  of  Marshall  and 
Olkm  (1967),  Gumbel  (1960),  and  Smgpurwalla  and  Youngren 
(1993;  henceforth  S-Y),  and  a  bivariate  exponential  induced  by 
lhc  copula  of  a  bivariate  Pareto  distribution. 

5.1  The  Bivariate  Exponential  of  Marshall  and  Olkin 

Suppose  that  the  HPs  X\  and  X2  have  the  BVE  of  Marshall 
and  Olkin  (1967),  with  Xi,X2,  and X 12  as  parameters.  To  ensure 
that  the  marginal  distributions  are  unit  exponentials,  we  need  to 
have  X\  —  X2  —  X  and  X  4-  Xt2  =  1,  with  X12  >  0;  the  latter 
inequality  ensures  dependence  between  X\  and  Y2. 

Let  T\  and  T2  be  the  lifetimes  corresponding  to  X\  and  X2 
and  the  cumulative  hazard  functions  77i  (?i)  and  7/2 (*2)-  Then, 
because 

P(T\  >  h  Ti  >  tj\  •) 

=  p(x 1  >  H\  (ix),X2  >  W2(/2);  X,  X12) 

=  exp[— X(W, (ti)  +  tf2(f2))  - Xi2  maxO/iOO,  H2(t2))], 

we  can  generate  families  of  bivariate  distributions  for  T\  and  72, 
by  assuming  specific  forms  for  //,*(/),  for  i  —  1,2.  In  particular, 
if  Hi(ti)  =  (a,/|)A\  /=  1,2,  then 

P(T{  >  t\,  T2  >  /2;  •)  =  exp[-{A.[(airi)<3'  +  (a2t2)fil] 

+  Xi2max[(o'i/1)^1,  (a2t2)h]\], 

which  is  a  bivariate  Wcibull  of  the  Marshall-OIkin  type 
If  Hi(ti)  =  at  ln(l  +  ^|/|),  i=l,2,  then 

7>(7i  >  t\,T2  >t2;  ■) 


1  \Aai  /  1  \*a2 

1  4-  f\t\  )  \  1  +  fih) 
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which  resembles  the  bivariate  distribution  of  Muliere  and 
Scarsini  (see  Kotz,  Balakrishnan,  and  Johnson  2000,  hence¬ 
forth  KBJ,  pp.  408  and  595).  This  distribution  is  also  known 
as  the  Marsha]  1-Olkin-type  Pareto  distribution  (see  KBJ  2000, 
p.  612).  Note  that  ///ft)  =  [a,  ln(l  +  piU)]  corresponds 

to  an  increasing  (decreasing)  rate  of  consumption  of  the  HP. 

Continuing  in  the  foregoing  vein,  if  ///ft)  =  a, {e^li  —  1), 
i  =  1,2,  then  the  induced  distribution  of  T\  and  T'i  is  given  as 

P(X\  >  ti,  T2  >  h\  •) 

—  e(<*i+<x2)i  .  exp[— [a\Xe^tx  -\-a2^7t7 
+  kn  max(ai(^l/l  —  1),  a2(e^-\))}], 


5.3  The  Bivariate  Exponential  of  S-Y 

Here  again,  we  follow  the  notation  of  Section  5. 1  and  sup¬ 
pose  that  for  some  parameter  m, 

P(X i  >H\(t\),X2  >H2(t2); m) 

1  -  m  •  min(//i(fi),  Hijti))  j-  m  •  max(#i 

1  +m(H\(t\)  4- W2O2)) 

X  ^-m  max{// 1  (t\ ),Hj fo)) 

This  distribution  has  unit  exponential  marginals  if  m  =  2. 

If  we  set  H\(t\)  >  Hiiti),  then 

P(Xi>Hl(tl),X2>H2(t2)) 


and  if  Hi (f/)  =  (1  —  e  '0/(1  +  e  '•),  i—  1,2,  the  logistic  func¬ 
tion,  then 


1  —  2H2O2)  +  2tfi(h) 
1  +2(H\(t])  +  II2O2)) 


exp( — 


T(Tl  >  t\,T2  >t2\  •)  =  exp 


/ 1  -  l-e-'2\ 

\T+  e_t|  +  1  +e-'2  / 


+  A.|2max 


'1  \+e 

/l-e-'i  1- 

\\  +e 


The  multivariate  distributions  for  T\  and  T2 ,  when  derived 
assuming  that  the  //*ft)  take  any  of  the  forms  given  in  Sec¬ 
tion  5.1,  are  not  of  any  recognizable  type;  they  appear  to  be 
new.  This  is  not  surprising,  because  the  bivariate  exponential 
given  earlier  is  also  not  of  a  well-recognized  form. 


Neither  of  these  distributions  is  of  a  recognized  form  The  first 
form  of  Hi(ti)  corresponds  to  an  exponential  rate  of  consump¬ 
tion  of  the  HP,  whereas  the  second  corresponds  to  a  rale  of  that 
which  starts  at  ^  at  t  =  0  and  asymptotes  to  I  as  t  becomes 
infinite. 

5.2  Trie  Bivariate  Exponential  of  Gumbel 

Following  the  notation  of  Section  5.1,  suppose  that  for  some 
parameter  0  <  0  <  1 , 

p(Xi>Hi«i),x2>H2(t2y,e) 

=  exp[-Wi(fi)  -H2O2)  -  fWi(h)//2(f2)J. 

This  is  the  bivariate  exponential  of  Gumbel  (1960),  with  mar¬ 
ginals  that  are  always  unit  exponentials.  If  //,-ft)  =  (a/f/)^, 
i  =  1,2,  then  the  induced  distribution  of  7T  and  T2  is 

P(T\  >t\,T2>t2\  ) 

-expf-fCaifi)^1  +  ((*2*2)^  +  (<*2*2)^}]; 

we  call  this  distribution  the  bivariate  Weihull  of  the  Gumbel 
type. 

If  ///ft)  =  a,  ln(l  +  pi  /,),  i=l,2,  then 


5.4  Unit  Exponentials  Induced  by  Copulas 


New  families  of  multivariate  distributions  with  unit  exponen¬ 
tials  can  be  created  by  the  method  of  copulas  and  by  invok¬ 
ing  Sklar’s  theorem  in  reverse  (see,  e.g.,  Nelson  1995).  We  can 
then  use  these  multivariate  exponentials  as  a  seed  for  generating 
other  families  of  multivariate  distributions. 

As  an  example  of  the  foregoing,  consider  a  bivariate  Pareto 
distribution  of  the  form 

(  b  y+1 

P(Xl>xuX2>x1;-)=  — - —  ; 

\b  +  xi  +X2) 

its  copula,  for  u  >  0  and  v  <  1 ,  is 

Ca(u,  v)  =  u+v- 1  +  (d  _„)-(a+1)  +  (l  —  v)-(fl+1)  -  l)_(fl+1). 


If  wc  set  u  =  1  — exp(—  H\(t[))  and  v  =  1  —  exp(— //2ft)).  then 
it  can  be  seen  (see  Singpurwalla  and  Kong  2004)  that 

P(x  1  >HiUi),X2>H2(t2);a) 

which  is  a  bivariate  distribution  with  unit  exponentials  as  mar¬ 
ginals.  We  may  now  choose  any  desired  form  for  the  ///ft), 
i  =  1 , 2,  to  produce  new  families  of  bivariate  distributions  of 
the  form  P{T\  >  t\,  T2  >  *2;  •)• 


P(T[  >t\7T2>  t2\  •) 

_J _ V1  /  1  V* 

1  +  Pi ri  /  \  1  +  Pih) 

x  exp(— ^o;io;2  ln(l  +^in)ln(l  +Pit2)), 

which  is  a  multivariate  distribution  with  marginals  that  are  a 
Pareto;  we  call  this  distribution  a  bivariate  Pareto  of  the  Gumbel 
type. 


6.  CUMULATIVE  HAZARD  PROCESSES  AND 
RANDOM  KILLING 

Our  discussion  thus  far  has  been  based  on  the  premise  that 
H(t)  is  a  deterministic  function  of  t.  This  may  be  a  reasonable 
first  step.  A  more  meaningful  strategy  is  to  assume  that  H(t) 
is  described  by  some  nondecreasing  and  nonnegative  stochas¬ 
tic  process  \  t  >  0}.  There  is  some  precedence  for  doing 
so  in  both  the  biostatistical  and  the  reliability  literature  (see 
Singpurwalla  1995),  although  the  motivation  there  is  different 
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from  what  we  give  here.  This  is  because  we  see  H(t)  as  a  proxy 
for  usage  until  time  r,  and  conceptualizing  usage  as  a  random 
process  is  more  natural  than  simply  declaring  that  the  cumu¬ 
lative  hazard  is  a  stochastic  process.  With  H(t)  described  as 
a  stochastic  process,  the  time  to  failure  T  will  be  the  hitting 
time  of  {//(/);  /  >  0}  to  a  random  barrier  X,  which  is  the  HP 
of  the  item;  see  Figure  1.  Put  alternatively,  the  lifetime  of  an 
item  corresponds  to  the  killing  time  of  {//(/);  t  >  0}  by  a  ran¬ 
dom  threshold  X.  The  notion  that  lifetimes  correspond  to  hitting 
times  of  stochastic  processes  to  some  barrier  was  also  explored 
in  the  pioneering  work  of  Esary,  Marshall,  and  Proschan  (1973; 
henceforth  EMP)  and  in  the  more  recent  works  of  Durham 
and  Padgett  (1997),  Pettit  and  Young  (1999),  Yang  and  Klutke 
(2000),  and  Duchesne  and  Rosenthal  (2003),  the  difference  be¬ 
ing  that  to  these  authors,  the  underlying  stochastic  process  is 
an  observable  phenomenon  such  as  degradation,  aging,  or  cu¬ 
mulative  damage.  A  consequence  of  the  foregoing  is  that  the 
results  thus  obtained  pertain  to  specific  scenarios.  In  contrast, 
the  approach  of  considering  any  failure  time  as  the  hitting  time 
of  a  process  {H(t);  t  >  0}  to  a  random  threshold  X  whose  distri¬ 
bution  is  an  exponential  (1)  provides  a  common  architecture 
for  developing  classes  of  survival  functions,  with  each  class 
determined  by  the  nature  of  the  process.  For  example,  when 
{//(0 ;  /  >  0}  is  a  positive  nondecreasing  Levy  process  (special 
cases  of  which  are  the  compound  Poisson,  the  gamma,  and  the 
stable),  a  general  result  for  the  survival  function  is  obtained.  We 
discuss  this  and  related  matters  in  what  follows. 

6.1  The  Hazard  Rate  and  Cumulative  Hazard  Process 

The  puqiose  of  this  section  is  to  obtain  a  result  analogous 
to  that  of  (2)  when  H(t)  is  a  stochastic  process.  To  obtain  an 
analog  to  the  left  side  of  (2),  we  proceed  formally  by  consider¬ 
ing  a  probability  measure  space  (f2,  T ,  P)  on  which  all  random 
variables  and  processes  are  defined. 

Let  {/i(s);  s  >  0}  be  a  nonnegative  and  right-continuous  sto¬ 
chastic  process,  and  let  T  be  a  real-valued  random  variable 
denoting  the  lifetime  of  an  item.  For  /  >  0,  we  define  the 
cr -algebras  Tt  and  T  as 


For  an  analog  of  the  right  side  of  (2),  we  assume  that 
[H  (0  >  0}  is  a  nonnegative,  nondecreasing  stochastic  process 
and  consider  the  hitting  time  of  this  process  to  a  random  thresh¬ 
old  X  whose  distribution  is  an  exponential  (1).  Then,  assuming 
independence  of  H(t)  and  X , 


P(T>t)  =  P(X>H(t)) 


-r 

Jo 


exp  (~y)H,(dy) 


=  E[exp(-ff  (/))]. 


(8) 


where  //,(•)  is  the  density  of  the  distribution  of  H(t).  Thus  an 
analog  to  the  right  side  of  (2),  with  { // (r > ;  t  >  0}  a  stochastic 
process,  is 


P(T  >t)=E[exp(-H (/))].  (9) 


The  right  side  of  (9)  is  the  Laplace  transform  of  the  process 
{H(t)\  f  >  0},  which  for  the  Levy  process  has  an  explicit  form, 
namely 


£'[exp(-//(0)]  =  expj^-r^  [1  -  exp(~y)]v(rfy) 


(10) 


where  v(dy)  is  the  I^vy  measure  of  the  process  and  the  integral 
term  is  the  Laplace  exponent  of  the  lAvy  process;  complete  the 
L6vy-Khinchin  formula  of  Protter  (1990).  An  attractive  feature 
of  the  argument  that  leads  to  (8)  is  the  straightforward  manner 
in  which  it  is  developed.  In  contrast,  the  argument  of  (7)  calls 
for  some  appreciation  of  randomized  stopping  rules  associated 
with  stochastic  processes. 

In  what  follows  we  consider  several  possible  candidates  for 
the  process  {//(/);  t  >  0),  starting  with  the  simplest  and  mov¬ 
ing  to  the  more  general.  In  most  cases,  explicit  expressions  for 
P(T  >  t)  are  obtained;  in  others,  computations  and  approxima¬ 
tions  may  be  needed. 

The  choice  of  which  of  the  following  processes  to  use  de¬ 
pends  on  the  application.  Presumably,  because  H(t)  encapsu¬ 
lates  the  resource  used  until  time  f,  the  selection  of  a  suitable 
process  for  {H(t)\  t  >  0}  would  depend  on  the  pattern  of  use  of 
the  item. 


Tt  —  &{h(s);  5  <  /}  and  T  =  cr  {/z(^) ;  s  >  0). 

Then  {h(s)\  s  >  0}  is  defined  as  the  hazard  rate  process  of  7\  if, 
for  t  >  0, 


P(T  >  t\T)  =  exp^—  J<h(s)ds^j. 

It  now  follows,  from  a  result  of  Pitman  and  Speed  (1973),  that 
T  is  a  randomized  stopping  time ,  so  that 


Consequently, 


P(T  >  t\f,)  =  exp(- J  h(s)ds^,  t>  0. 


P(T>  t)=E 


or 


P(T>t)=E[cxp(-H(t))l  (7) 

where  [H(t)\t  >  0}  is  the  cumulative  hazard  process.  Equa¬ 
tion  (7)  is  our  analog  of  the  left  side  of  (2). 


6.2  Cumulative  Hazard  Processes  and 
Their  Survival  Functions 

The  process  (//(O’  t  >  0}  is  required  to  be  nonnegative,  non¬ 
decreasing,  and  right-continuous.  Thus  our  choice  of  candi¬ 
date  processes  is  limited.  Clearly,  the  Brownian  motion  process, 
which  has  often  been  used  to  describe  degradation  and  wear, 
must  be  eliminated.  However,  certain  functionals  of  the  Brown¬ 
ian  motion,  such  as  the  running  maxima,  are  viable  candidates, 
and  this  is  the  first  process  considered. 

6.2. J  The  Maxima  of  Brownian  Motion .  Suppose  that 
{ W(0;  t  >0}  is  a  standard  Brownian  motion  process  [i.e., 
W^O)  =  0];  for  any  t  >  0,  W(t)  has  a  Gaussian  distribution 
with  mean  0  and  variance  /,  and  {W(t)\t  >  0}  has  stationary 
independent  increments.  If  we  set 

H(t)=  sup  {W(s)},  t  >  0, 

0<j<f 

then  the  process  {//(0 ;  t  >  0}  will  be  continuous,  nonnegative, 
and  nondecreasing;  this  is  called  a  Brownian  maximum  process . 
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It  is  well  known  that  Tx  =  inf{/  >  0;  W(t)  >  jc}  =  inf {/  > 
0;  //(/)  >  x },  the  time  at  which  the  process  (W(/);  t  >  0}  first 
hits  a  barrier  x,  x  >  0,  has  an  inverse- Gaussian  distribution  (see 
Pettit  and  Young  1999).  Consequently,  the  hitting  time  of  the 
process  {//(/);  /->  0}  to  x  also  has  an  inverse-Gaussian  distrib¬ 
ution,  specifically, 

P(Tx<i)=2(\-<t>(x/St)), 

where  <J>(«)  =  ds. 

Because  the  time  to  failure  of  an  item  is  the  time  at  which  the 
process  {H(t)\  t  >  0}  first  hits  the  (HP)  X ,  where  X  is  exponen¬ 
tial  (1), 

rco 

P(T  <t)=  /  2(1  -  <P(x/Vt))e~xdx=  1  -2er/2<t>(— v/r), 

Jo 

so  that 

P(T  >  r)  =2e,/24>(— -s/r),  (11) 

an  expression  that  is  easily  evaluated. 


6.2.2  The  Compound  Poisson  Process.  The  compound 
Poisson  process  with  an  (arrival)  rate  A  and  iid  jumps  7f  ,  i  — 
1,2,...,  with  P(Ji  <a))—  G(co)  is  another  possible  candidate 
for  describing  the  process  {//(/);  /  >  0}.  This  process  increases 

only  by  jumps  of  size  7/,  i  =  1,2, _ If  we  assume  that  the 

7/’s  are  also  independent  of  the  HP  Xy  then,  given  A, 


00 


p(T>m=YL' 

k=0 


{Xt)k 


k\ 


roo 

/  Gik\x)t 

Jo 


ldx7 


where  Gw(-)  is  the  A-fold  convolution  of  G(-)  with  itself.  Ihe 
foregoing  simplifies  (see  BMP  1973)  as 


P(T  >  /|A)  =exp(— X/),  (12) 

for  all  G(-)  with  G(x)  =0,  jc  <  0. 

When  X  >  0  is  unknown,  we  may  average  out  P(T  >  /|A) 
with  respect  to  any  distribution  of  A.  This  would  lead  us  to  con¬ 
clude  that  P(T  >  t )  has  a  hazard  rate  function  h(t)  that  is  a  de¬ 
creasing  function  of  /  >  0.  Thus  items  experiencing  use  of  a 
resource  described  by  a  compound  Poisson  process  will  neces¬ 
sarily  have  lifetimes  with  a  heavy-tailed  distribution  function. 
For  example,  if  A  has  a  gamma  distribution,  then  P(T  >  t)  will 
have  a  Pareto  distribution,  that  is  heavy-tailed. 

6  2.3  A  Special  Markov  Process.  EMP  (1973)  considered  a 
Markov  process  for  {//(/);  t  >  0}  with  a  special  feature  that  de¬ 
scribes  proneness  to  wear.  Whereas  their  interpretation  of  H(t) 
is  unlike  ours,  their  special  feature  is  appropriate  to  our  setup, 
specifically  (a)  H(fi)  =  0;  (b)  H(t  +  A)  -  H(t)  >  0,  V/,  A  >  0; 
and  (c)  P(H{t  +  A)  -  H(t)  <  u\H{t)  =  z)  i  z,  t .  The  practi¬ 
cal  import  of  (c)  is  that  proneness  to  wear  increases  with  us¬ 
age.  With  the  foregoing  in  place,  EMP  (1973)  showed  that  for 
any  barrier  jc,  the  hitting  time  of  the  process  {//(/);  t  >  0}  has 
a  distribution  with  a  failure  rate  function  h{u)  such  that  h(t) 
increases  in  r,  where 

h{t)  =  -  f  h(u)  du.  (13) 

1  Jo 

Such  distributions  are  said  to  have  an  increasing  hazard  rate 
average  property.  Because  the  barrier  in  our  case  is  the  HP  X , 


where  X  has  an  exponential  (1)  distribution,  we  note  that  for 
the  special  Markov  process  for  {//(/);  t  >  0),  the  survival  func¬ 
tion  P(T  >  t)  can  be  written  as  an  exponential  (1)  mixture  of 
distributions  with  the  increasing  hazard  rate  average  property. 


6.2.4  A  Nonnegative  Levy  Process.  An  omnibus  way  of  de¬ 
scribing  {//(/);  /  >  0}  is  through  a  nonnegative  L6vy  process, 
that  is,  a  continuous  process  with  stationary  independent  in¬ 
crements.  Such  processes  are  examples  of  Markov  processes 
and  include  the  compound  Poisson,  the  gamma,  and  all  stable 
processes  as  special  cases.  Furthermore,  a  L£vy  process  renews 
itself  at  stopping  times  and  has  a  strong  Markov  property,  and 
all  of  the  nonnegative  L6vy  processes  are  limits  of  compound 
Poisson  processes  (see  Protter  1990).  Thus  the  process  pro¬ 
vides  a  convenient  general  platform  for  describing  {//(/);  t  >  0} 
and  makes  the  result  of  (12)  based  on  the  compound  Poisson 
process  central.  Besides  the  foregoing  generalities,  the  main  at¬ 
traction  of.  considering  a  Levy  process  stems  from  the  fact  that 
its  Laplace  transform  (given  by  the  Levy-Khinchin  formula) 
takes  a  form  identical  to  that  of  (10),  namely  » 


P(T>t) 


=“pH 


(1  -  exp  (~y))v(dy) 


}■ 


(14) 


where  v(dy)f  the  L6vy  measure,  characterizes  both  the  expected 
frequency  and  the  size  of  the  jumps  (nonnegative  in  our  case) 
in  a  Levy  process. 

For  the  compound  Poisson  process  of  Section  6.2.2,  v(dy)  = 
A G(dy)y  and  if  G  had  a  gamma  distribution  with  scale  a  >  0 
and  shape  ft  >  0,  then 

v(dy)  ^Xap/~le-aydy/Tifi). 


In  the  case  of  a  gamma  process  [i.e.,  when  for  any  t  >  0,  H{t) 
has  a  gamma  distribution  with  scale  a  >  0,  and  shape  fit], 

v{dy)={pe-aVfy)dy,  (15) 


whereas  when  {// (/) ;  /  >  0)  is  described  by  a  stable  process, 

v(dy)  =  —^—y-O+fi)  dy  (1 6) 

•  (1  -  P) 

for  parameters  a  >  0  and  fi  €  ( 0,1).  Plugging  (15)  and  (16) 
into  (14)  will  give  P(T  >  t)  for  the  special  cases  of  the  gamma 
and  the  stable  process;  also  see  (18)  in  the  next  section. 


6.2.5  Continuous  and  Increasing  Strong  Markov  Processes. 
One  of  the  more  stoking  results  in  stochastic  processes  theory 
pertains  to  continuous  and  increasing  processes  that  have  the 
strong  Markov  property.  It  has  been  shown  that  such  processes 
have  deterministic  paths  up  to  random  killing.  Essentially,  this 
means  that  a  continuous  increasing  strong  Markov  process  is 
essentially  deterministic.  This  result  dates  back  to  work  of 
Blumenthal,  Getoor,  and  McKean  (1962).  Loosely  speaking, 
if  {//(/);  /  >  0)  is  an  increasing,  continuous,  strong  Markov 
process  with  a  state  space  of  form  [ a,b)y  then  there  exists  a 
strictly  increasing  continuous  function  £(•)  on  the  state  space 
such  that  for  all  /  >  0,  H(t)  =  k~l[k(H( 0))  +  /];  for  specifics, 
see  corollary  1  of  Cinlar  (1979).  Thus  the  sample  path  of  the 
{//(/),  /  >  0}  process  is  a  deterministic  function  of  the  ini¬ 
tial  state  of  the  process,  namely  H( 0)  =  0,  and  time  t.  Once 
the  process  { H(t );  t  >  0}  is  considered  (essentially)  determinis¬ 
tic,  obtaining  the  hitting  time  of  H(t)  to  a  barrier  is  relatively 
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straightforward;  it  is  also  deterministic  if  the  barrier  is  a  known 
constant.  Randomness  of  hitting  times  enters  into  the  picture 
when  the  barrier  is  random,  which  is  so  in  our  case. 

As  an  illustration  of  the  f  oregoing,  suppose  that  the  process 
{//(/);  t  >  0}  is  an  increasing  Levy  process.  Recall  that  L6vy 
processes  are  continuous,  have  stationary  independent  incre¬ 
ments,  and  thus  are  strong  Markov.  When  this  is  the  case,  the 
function  &(•)  is  such  that  for  some  a  >  0, 

roc 

H(t)  =  at+  (1  -zxp(-ut))v(du),  (17) 

Jo 

where  v(du)  is  the  Levy  measure  of  the  process. 

If  we  set  a  =  0  and  assume  that  {//(/);  t  >  0}  is  a  gamma 
process,  then  v(du)  is  given  by  (15),  and  the  deterministic  cu¬ 
mulative  hazard  function  turns  out  to  be 


og 


/  >0 


(see  Kebir  1991  for  more  details). 

The  unit  fails  when  H(t)  gets  killed  by  a  threshold  x;  that  is, 
TXt  the  time  to  failure  for  a  fixed  threshold  x,  is 


Tx  =  ct(ex"i-  1). 


Averaging  with  respect  to  its  exponential  (1)  distribution,  wc 
have 

P(T  >  f)  =  +  ~ 

which  is  a  Pareto  distribution.  Note  that  the  Pareto  distribution 
also  arises  in  the  context  of  a  compound  Poisson  process  for 
{//(/);  /  >  0}  when  the  distribution  of  A,  the  arrival  rate,  is  as¬ 
sumed  to  be  a  gamma;  see  the  discussion  after  (12). 

To  summarize,  in  practically  all  of  the  cases  that  we  have 
considered  so  far,  closed-form  expressions  for  P(T  >  t)  are 
available.  The  sole  exception  is  the  special  Markov  process  of 
Section  6.2.3,  for  which  our  result  is  merely  qualitative.  Our  fi¬ 
nal  case,  considered  next,  pertains  to  an  exponential  functional 
of  Brownian  motion,  here  a  closed-form  result  is  not  available. 
We  chose  this  case  because  of  its  novelty  and  plausible  applica¬ 
bility. 


6.2.6  Integrated  Geometric  Brownian  Motion  Process.  In 
Section  6.2.1  we  considered  the  running  maximum  of  a  stan¬ 
dard  Brownian  motion  as  a  model  for  {//(/);  t  >  0).  Here  we 
consider  another  functional.  Specifically,  let 

H(t)  =  f  exp(2 W(s))ds,  (19) 

70 

where  W(s)  is  a  standard  Brownian  motion.  We  choose  the 
scalar  2  for  convenience;  its  role  will  become  clear  in  the 
sequel.  Observe  that  exp(2W(j))  is  always  positive  and  that 
H(t)  is  continuous  and  strictly  increasing  in  t.  Recall  that 
a  Brownian  motion  has  continuous  sample  paths.  Whereas 
suP0<j</{V^(J); s  >  0}  increases  in  t  by  steps,  the  H(t)  of  (19) 
is  a  strictly  increasing  function  of  t.  As  stated  earlier,  Brown¬ 
ian  motion  has  often  been  used  to  describe  crack  growth  and 
degradation.  The  foregoing  transformation  of  the  process  is  ne¬ 
cessitated  by  the  requirement  that  H(t)  be  non  negative  and  non- 
dccreasing.  Our  sense  is  that  the  Hit)  of  (19)  also  could  be  a 
viable  candidate  for  describing  degradation  and  wear. 


With  the  foregoing  in  place,  we  let 

Tx  =  inf{/  >  0 :  H(t)  =  x},  (20) 

for  some  barrier  x  >  0;  that  is,  Tx  is  the  hitting  (killing)  time 
of  the  process  {//(/);  /  >  0}  to  a  threshold  x.  Because  //(;)  is 
continuous  and  increasing,  we  have  that 


P(Tx>t)  =  P(H(t)  <x). 


(21) 


To  evaluate  the  right  side  of  the  foregoing,  we  need  to  know 
the  density  of  H(t)  for  a  fixed  value  of  t.  For  convenience,  we 
denote  H(t)  by  //;,  and,  following  the  notation  of  Yor  (1992), 
note  that 


P(Htedv) 

dv 


‘\/rrex|,(-5+§cosh!>’ 


) 


-  V 

2 / 

x  sinhysin^—^  (l  —  coshy)) dy. 


where  <t>(u)  is  as  defined  in  Section  6.2.1.  Consequently, 

P{Htzdv), 


P(TX  >  t)  =  [ 
Jo 

from  which  it  follows  that 


P(T>t) 


-r 

Jo 


P(TX  >  t)e~x  dxy 


Jr  OO  rOO 

f  /  P(Htedv)e~xdx, 
o  Jo 


(22) 

(23) 


with  P(Ht  6  dv)  as  given  earlier. 


7.  COMPETING-RISK  AND 
DEGRADATION  PROCESSES 

7.1  Competing  Risks  and  Competing-Risk  Processes 

Loosely  speaking,  the  term  “competing  risk s”  connotes  com¬ 
peting  causes  of  failure,  and  interest  centers  on  the  cause  of  fail 
ure  and/or  the  time  to  failure  given  that  there  are  several  agents 
competing  for  an  item's  lifetime.  The  issue  can  be  quite  com¬ 
plex  because  the  causes  do  not  operate  in  isolation  of  one  an¬ 
other,  it  often  being  the  case  that  one  cause  acerbates  the  effect 
of  the  other.  Traditionally,  the  model  used  for  encapsulating 
the  scenario  of  failure  under  competing  risks  is  the  reliability 
of  a  series  system  with  independent  (or  dependent)  component 
lifetimes,  the  latter  representing  the  causes  of  system  failure. 
In  what  follows,  we  shift  focus  from  independent  or  depen¬ 
dent  lifetimes  to  independent  or  dependent  HPs  to  develop  a 
framework  that  could  provide  a  more  realistic  description  of 
the  compcting-nsk  phenomenon.  Accordingly,  let  7/  denote  the 
time  to  failure  of  the  ith  component  of  a  series  system  of  k  com 
ponents,  i  =  1  and  T  the  time  to  failure  of  a  system.  Then 

P(T  >  i)  =  P{Hdt)  <X, . Hk(t)  <  Xk). 

where  //,*(/)  is  the  cumulative  hazard  (or  risk)  experienced  by 
the  ith  component  and  Xg  is  its  HP.  If  the  HPs  are  assumed  to  be 
independent,  then 

P(J  >  0  =  exp[- (//[(/)  +  •  •  +  //*(;))],  (24) 

suggesting  an  additivity  of  the  cumulative  hazards  (or  risks).  If 
the  HPs  are  assumed  to  be  dependent,  then  the  nature  of  depen¬ 
dence  would  dictate  the  form  taken  by  P(T  >  /);  see  Section  5. 
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In  either  case,  our  expression  for  P(T  >  t)  would  be  the  same 
as  what  we  would  obtain  assuming  the  dependence  or  indepen¬ 
dence  of  the  lifetimes  7/.  Thus  it  would  appear  that  little,  if  any, 
gain  has  been  achieved  by  shifting  focus  from  the  7/’s  to  the 
X/’s.  But  there  is  another  way  to  look  at  (24),  a  way  with  paves 
the  path  for  obtaining  another  expression  for  the  survival  func¬ 
tion  of  an  item  experiencing  multiple  risks. 

Observe  that  (24)  is  also  the  survival  function  of  a  single  item 
that  has  a  cumulative  hazard  of  H(t)  =  2J/=i  ft(0*  al  time  t. 
But  when  this  is  the  case,  how  can  we  interpret  each  //*(*)? 
More  generally,  in  the  case  of  a  single  item  with  a  cumulative 
hazard  of  //(f),  can  there  be  a  meaningful  decomposition  of 
//(f),  and,  if  so,  can  it  be  additive?  Moreover,  which  of  the  two 
perspectives  more  accurately  reflects  the  competing-risk  phe¬ 
nomenon? 

One  possible  strategy  for  addressing  these  questions  is  to  see 
each  //,  (f) ,  i  =  as  the  consequence  of  a  covariate  and 

to  suppose  that  if  the  item  were  to  experience  covariate  i  alone, 
then  its  time  to  failure  would  coincide  with  the  item  at  which 
///(f)  crossed  its  hazard  potential  X.  With  the  item  simultane¬ 
ously  experiencing  k  covariates,  its  survival  function  would  be 

P(T>t)  =  P(Hl{t)<X . Hk(t)<X) 

-/>(*>  max  {//,(;) . Hk(t)}) 

.  =  exp(— .,#*($})•  (25) 

Clearly,  under  the  scenario  of  an  item  simultaneously  experi¬ 
encing  k  causes  of  failure  (risks),  the  decomposition  of  //(f)  is 
not  additive. 

Whereas  (25)  could  be  new  to  the  literature  on  compet¬ 
ing  risks,  it  is  worth  noting  that  the  two  scenarios  discussed 
earlier — the  traditional  one  involving  a  series  system  that  leads 
to  (24)  and  the  one  pertaining  to  the  single  item  that  leads  to 
(25) — are  related  because  considering  a  single  HP  X  is  tanta¬ 
mount  to  considering  k  HPs  that  are  totally  (and  positively)  de¬ 
pendent  on  one  another.  This  leads  to  the  following  result. 

Theorem  4.  The  survival  function  under  any  series  system 
model  for  competing  risks  with  positively  dependent  hazard  po¬ 
tentials  is  bounded  as 

exp(-EH'w)^p<T^t> 

<  exp{—  max{tfi(0, . . . ,  Hk{t)}). 

This  theorem  shows  that  the  two  perspectives  on  competing- 
risk  modeling  can  be  reconciled  through  the  notion  of  indepen¬ 
dent  and  dependent  hazard  potentials,  with  the  left  side  of  the 
inequality  reflecting  the  former  and  the  right  side  reflecting  the 
latter. 

7.1.1  Dependent  Competing  Risks  and  Competing  Risk 
Processes.  In  our  discussion  thus  far,  the  ///(f)* s  have  been 
assumed  known  and  specified.  Consequently,  the  matter  of 
independent  or  dependent  competing  risks  was  not  germane; 
dependence  and  independence  were  embodied  in  the  context 
of  HPs.  But  the  prevailing  view  of  what  constitutes  depen¬ 
dent  competing  risks  entails  considering  dependent  lifetimes 
in  the  series  system  model  mentioned  earlier.  We  consider  this 
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approach  circuitous.  A  proper  framework  for  discussing  de¬ 
pendent  competing  risks  requires  that  the  //,(/) ’s  be  random; 
a  comprehensive  way  of  doing  this  is  to  assume  a  stochastic 
process  model  {//,*(f>;  f  >  0},  i  =  1,  as  was  done  in  Sec¬ 
tion  6.  We  call  such  a  model  a  competing-risk  process,  and 
call  the  £- variate  process  {//i(f), . .  • ,  //*(f);  t  >  0}  a  dependent 
competing-risk  process  if  the  ///(f) ’s  are  interdependent  A  unit 
fails  when  any  one  of  the  k  marginal  processes  {///(f);  f  >  0), 
i  =  hits  the  item's  HP  X.  Interdependence  of  the 

Hi(ty s  will  induce  dependence  between  the  corresponding  life¬ 
times  7/,  Thus  the  prevailing  notion  of  what  con¬ 

stitutes  dependent  competing  risks  will  be  sustained,  albeit 
more  as  a  consequence  than  as  a  fundamental  construct  View¬ 
ing  the  competing-risk  scenario  from  the  standpoint  of  hitting 
the  HP  offers  a  convenient  platform  for  appreciating  the  phe¬ 
nomenon  of  lifetimes  under  dependent  competing  risks. 

Having  stated  the  foregoing,  the  question  still  remains  as 
to  what  would  be  suitable  models  for  the  k- variate  process 
{H\  (f ),...,  //*(f);  f  >  0),  where  the  marginal  processes  {Z/,(f); 
f  >  0},  i  —  1  are  such  that  each  ///(f)  is  nondecreas¬ 

ing  in  f.  One  possibility  would  be  to  let  each  marginal  process 
be  a  Brownian  maximum  process  of  Section  6.2.1  and  de¬ 
duce  the  interdependence  between  the  marginal  processes 
from  the  assumed  dependence  of  the  fc-variate  Brownian 
motion  process  that  generate  Brownian  maxima  processes. 
The  specifics  remain  to  be  worked  out.  Another  possibility, 
in  the  case  where  k  —  2,  is  to  assume  that  {//j(f);  f  >  0}  is  a 
nonnegative,  nondecreasing,  and  right-continuous  process  of 
the  type  discussed  in  Section  6.2,  but  that  the  sample  path  of 
{//2(f),  t  >  0}  is  an  impulse  function  of  the  form  //2(f)  =  0 
for  all  f  ?^f*,  and  Ihit*)  ~oo,  for  some  t  —  t*  >  0,  where 
the  rate  of  impulse  occurrence  depends  on  the  state  of  the 
process  {/// (f);  f  >  0}.  Such  a  model  may  be  meaningful  when 
the  process  {//1(f);  f  >  0}  can  be  identified  with,  say,  degrada¬ 
tion  and  the  process  {//2(f);  f  >  0}  can  be  identified  with  some 
form  of  trauma  with  a  rate  of  occurrence  depending  on  the  state 
of  the  degradation  process.  Here  degradation  and  trauma  com¬ 
pete  with  each  other  for  the  lifetime  of  the  system.  Lemoinc 
and  Wenocur  (1985)  and  Wcnocur  (1989)  have  proposed  the 
foregoing  as  a  framework  for  failure  modeling,  although  not  in 
the  context  of  competing  risks.  With  appropriate  modifications, 
their  results  could  be  adapted  for  the  competing-risk  scenario. 

7.2  Degradation  and  Aging  Processes 

Much  has  been  written  on  what  is  known  as  “degradation 
modeling”  and  reliability  assessment  using  degradation  data. 
The  thinking  here  has  been  that  degradation  is  an  observable 
phenomenon  and  that  failure  occurs  when  the  level  of  degrada¬ 
tion  hits  some  threshold  (see  Doksum  1991).  What  the  thresh¬ 
old  should  be  and  how  it  should  be  specified  has  not  been  made 
clear.  Our  review  of  the  engineering  and  materials  science  lit¬ 
erature  on  degradation  suggests  that  this  viewpoint  is  question¬ 
able.  This  is  because  degradation  is  viewed  as  the  irreversible 
accumulation  of  damage  throughout  life  that  ultimately  leads  to 
failure  (see  Bogdanoff  and  Kozin  1985,  p.  1 ).  Whereas  the  term 
“damage”  itself  is  not  defined,  it  is  claimed  that  damage  mani¬ 
fests  as  cracks,  corrosion,  physical  wear  (depletion  of  material), 
and  so  on.  Similarly,  with  regard  to  aging,  a  review  of  the  lit¬ 
erature  on  longevity  and  mortality  indicates  that  aging  pertains 
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to  a  unit’s  position  in  a  state  space  in  which  the  probabilities  of 
failure  are  greater  than  in  a  former  position  and  that  the  mani¬ 
festations  of  aging  are  the  biomedical  and  physical  difficulties 
experienced  by  older  individuals. 

Thus  it  appears  that  both  degradation  and  aging  are  abstract 
constructs  that  cannot  be  observed  and  thus  cannot  be  mea¬ 
sured.  However,  these  constructs  serve  to  describe  a  process 
that  results  in  failure  and  can  be  viewed  as  the  cause  of  ob¬ 
servables  such  as  crack  growth  and  corrosion,  which  can  be 
measured.  Thus  the  question  arises  as  to  how  one  can  math¬ 
ematically  model  the  degradation  phenomenon  and  relate  it  to 
the  observables  mentioned  earlier.  Put  another  way,  how  can  we 
mathematically  describe  the  cause  and  effect  phenomenon  of 
degradation  and  the  observables  that  it  spawns?  Our  proposal 
is  to  treat  the  former  as  a  cumulative  hazard  process  and  the 
latter  as  a  covariate  (or  a  marker)  process  that  is  influenced 
by  the  former  (sec,  e.g.,  Whitmore,  Crowder,  and  Lawless 
1998).  This  viewpoint  of  view  may  fit  well  with  Aalen’s  (1987) 
proposal  that  matters  of  causality  be  handled  by  stochastic 
process  models.  As  before,  the  item  fails  when  the  cumula¬ 
tive  hazard  process  hits  the  item’s  HP  X.  With  the  foregoing 
in  mind,  we  define  a  degradation  process  as  a  bivariate  stochas¬ 
tic  process  [H(t)y  Z{t)\  t  >  0),  with  H ( t )  representing  the  unob¬ 
served  cumulative  hazard,  and  Z(t)  representing  an  observable 
marker  that  is  a  precursor  to  failure.  Tn  principle,  {Z(r);  t  >  0}, 
the  marker  process,  can  also  be  a  vector  stochastic  process. 
Whereas  //(/)  is  required  to  be  nondecreasing,  there  is  no  such 
restriction  on  Z(r);  cracks  can  be  repaired  and  sometimes  do 
heal. 

7.2./  Specifying  Degradation  Processes.  When  the  marker 
process  can  be  meaningfully  described  by  a  Markov  process, 
for  which  there  is  some  precedence  when  the  marker  is 
crack  growth  (see  Sobczyk  1987),  the  degradation  process 
{//(f),  Z(r);  t  >  0)  can  be  taken  to  be  Cinlar’s  (1972)  Markov 
additive  process  (MAP).  When  this  is  the  case,  t  >  0}  is 
a  L6vy  process  with  parameters  depending  on  the  state  of  the 
(Z(f);  t  >  0)  process.  Another  way  to  link  the  two  processes 
in  question  is  to  use  Cox’s  (1972)  proportional  hazards  model 
or  Aalen’s  (1989)  additive  hazards  model,  in  which  linkage  is 
achieved  through  the  processes  [h(t)'y  t  >  0}  and  (Z(f);  t  >  0). 
The  ramifications  of  the  foregoing,  as  well  as  the  MAP,  remain 
to  be  explored.  Our  main  purpose  here  is  to  propose  a  different 
approach  for  examining  the  degradation  phenomenon  and  the 
role  of  the  HP  in  analyzing  it. 

8.  THE  HAZARD  GRADIENT  AND  CONDITIONAL 
HAZARD  POTENTIALS 


has  a  gradient  r(t)  =  (n(t), . . . ,  r„(t)),  where  r/(t)  =  jp.H( t), 
The  quantity  r(t)  is  called  the  hazard  gradient  of 
R(t)  (see  Marshall  1975a). 

The  relationship  among  //(t),  R(t)y  and  r(u)  is  expressed 
through 


H( t)=  f  r(u)du  (26) 

Jo 

and 

P(T\  >  t\ , . . . .  Tn  >  tn)  =  exp  r(u)  rfuj .  (27) 

Marshall  (1975a)  gave  a  decomposition  of  // (t)  that  is  note¬ 
worthy  due  to  its  role  in  allowing  us  to  prove  Theorem  5. 
Specifically, 


n(«l,0 . 0)  du\  -f-  f  r2(t[yU2,0y...y0)du2 

Jo 


H - h  f  cn{t\y . . .  ytn-\y  un)duny  (28) 

Jo 


where  r\(u  i,0,  ...,0)  is  the  failure  rate  of  T\  at  u\y  and 
n(t i , . . . ,  ti- 1 ,  ,  0, .  . ,  0)  is  the  (conditional)  failure  rate  of  7) 

at  Uj9  were  it  so  that  T\  >  fi, . . . ,  7/_i  >  h-i . 

The  first  term  on  the  right  side  of  (28)  is  the  cumulative  haz¬ 
ard  of  T\  at  t\  and  is  denoted  by  The  second  term  is  the 

integral  of  the  conditional  hazard  of  T2  at  ih  given  that  T\  >  t\ ; 
it  is  denoted  by  H2(t2\t\).  Similarly,  the  last  term  is  denoted  by 
///i (f/»|f tn—  1  )•  Thus 


//(t)  —  H\(t\)  +II2(t2\tl)  H - f  ///i {tn\t  1 ,  .  .  .  ,  tn- 1), 

and  because  R( t)  =  exp(— //(t)), 

P(Tl  >  fi, . . . ,  Tn  >  f„)  =  exp[-//i(fi)]  exp[-//2(f2|fi)]  •  •  • 

x  cxp[-Hn(tn | f|,...,fn_])].  (29) 

Clearly,  =  P(T\  >  fi),and,  using  arguments  that  par¬ 

allel  those  leading  us  to  (1),  we  can  see  that  for  any  n  >  2, 

exp[— ///,(//, |fi, . . . ,  f/i-j)] 


—  PCTri  >  tn\T \  >  . . . ,  Tn_i  >  tr i— i).  (30) 

Let  X\y...yXn,  be  the  HPs  corresponding  to  the  lifetimes 
T\y...yTn  and  the  cumulative  hazards  . . . ,  Hn{tn). 

Then,  a  consequence  of  the  relationship  (29)  is  that 

P(Tn  >  tn\Tl  >fl,...,rn_i  >f„_l) 

=  P(Xn>Hn(tn)\Xl  >Hi(n) . 

=  exp(“-//„(fn |?i , . . . ,  tn—  i)].  (31) 


The  purposes  of  this  section  are  to  obtain  a  generalization  of 
Theorem  1  and  to  further  explore  the  ramifications  of  depen¬ 
dent  life-lengths  and  dependent  HPs.  We  start  with  the  notion 
of  a  “hazard  gradient”  and  provide  a  strategy  through  which  a 
collection  of  dependent  lifetimes  can  be  replaced  by  a  collec¬ 
tion  of  independent  ones. 

Let  T\y . . . ,  Tn%  be  a  collection  of  n  lifetimes,  and  let  P(T\  > 
ti,  •  •  * ,  Tn  >  r„)  =  R(t\y  ...,tn)  be  its  survival  function.  Let 
t  =  (*i» .  •  • » tn)  be  such  that  R( t)  >  0.  The  quantity  Hit)  = 
In  R(t)  is  the  multivariate  analog  of  H(t)  Suppose  that  //( t) 


Because  T\ , . . . ,  T„  are  not  independent,  the  HPs  X\ , . . . ,  Xn 
arc,  by  virtue  of  Remark  1,  also  not  independent  However,  the 
hand  side  of  (31)  is  the  distribution  function  of  an  exponentially 
distributed  random  variable,  say  X*y  with  a  scale  parameter  of  1, 
evaluated  at  Hn(tn\t\y . . . ,  tn-\).  Thus,  from  (30),  we  have  the 
result  that  for  all  n  >  2, 

P(J n  ^  tn\'I  \  >  /l,  . . .  ,  7/i—l  ^  t„_i) 

-  P(Xn  >  IUtn)\Xl  >  7/i(ri). . . .  ,v,  >  H„. ,(/„-!)) 
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The  quantity  X *  is  called  the  conditional  HP  of  the  /zth  item; 
its  unit  exponential  distribution  is  indexed  by  Hn{tn\t\, . . . , 
r„_i).  In  contrast,  Xn,  the  HP  of  the  nth  item,  has  a  unit  ex¬ 
ponential  distribution  indexed  by  Hn(tn). 

Similarly,  corresponding  to  each  term  on  the  right  side  of  (28) 
except  the  first,  there  exist  random  variables  X%, . . . ,  X^_{y  in¬ 
dependent  of  one  another,  and  also  of  X%,  such  that 

P(Ti>t . . Tn>tn) 

=  P(X]>Hl(ti))P(X*2>H2(t2\tl))’- 
. ^-0). 

We  have  now  proved,  as  a  multivariate  analog  to  Theorem  1, 
the  following  results. 

Theorem  5.  Corresponding  to  every  collection  of  nonnega¬ 
tive  variables  7i, . .. ,  Tn,  having  a  survival  function  R(t\, . .. ,  tn), 
there  exists  a  collection  of  n  independent  and  exponentially 
distributed  random  variables  X\,X% . X*,  with  scale  para¬ 

meter  1;  X\  is  indexed  on  H\  (*i),  and  for  n  >  2,  X*  is  indexed 
on  Hn(tn\t\ , . . . ,  //i —  i ) . 

9.  SUMMARY 

In  this  article  we  have  described  a  unifying  perspective  on 
the  process  leading  to  the  failure  of  items  that  is  context- 
independent.  This  perspective  is  made  possible  through  the 
notion  of  an  HP  Besides  providing  an  alternative  means  of 
conceptualizing  the  failure  process,  the  HP  provides  a  means 
by  which  the  nature  of  dependence  between  the  lifetimes  can 
be  understood  and  exploited.  With  respect  to  the  latter,  we  can 
generate  (new)  families  of  multivariate  failure  distributions  us¬ 
ing  multivariate  exponentials  with  unit  exponential  marginals  as 
seeds.  For  items  required  to  operate  in  dynamic  environments, 
the  HP  provides  a  vehicle  by  which  new  families  of  univariate 
survival  functions  can  be  obtained.  This  is  achieved  by  estab¬ 
lishing  a  connection  between  lhe  failure  process  and  the  killing 
times  of  continuous  and  increasing  stochastic  processes  to  a 
random  barrier,  which  is  the  HP.  The  notion  of  a  IIP  generalizes 
to  a  nonexponential  distribution  for  the  barrier  and  also  to  the 
multivariate  case.  To  conclude,  the  importance  of  the  notion  of 
a  HP  stems  from  its  ability  to  provide  a  different  perspective  on 
failure,  a  model  for  the  cause  of  dependence  of  lifetimes,  new 
multivariate  models  for  failure,  new  univariate  models  for  sur¬ 
vival  in  dynamic  environments,  and  a  perspective  on  competing 
risks  and  degradation  modeling. 

This  article  is  expository  in  the  sense  that  it  provides  a  feel 
for  the  foregoing  possibilities.  Clearly,  more  can  be  done.  For 
one,  stochastic  processes  other  than  those  considered  in  Sec¬ 
tion  6.2  can  be  investigated.  We  may  do  more  on  considering 
covariates  that  drive  the  {//(/);  t  >  0}  process.  Another  possi¬ 
bility  would  be  to  consider  bivariate  processes  and  their  killing 
times  by  interdependent  barriers.  In  regard  to  the  latter,  one  may 
also.be  able  to  leverage  the  idea  for  assessing  competing  risks 
by  looking  at  the  bivariate  cumulative  hazard  process.  Finally, 
there  is  a  matter  of  statistical  inference  and  model  validation, 
topics  that  have  not  been  touched  on  here.  The  possibilities  of 
further  capitalizing  the  notion  of  an  HP  are  promising  for  relia¬ 
bility  theorists,  survival  analysts,  and  actuarial  scientists. 

[ Received  November  2005.  Revised  June  2006  ] 
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2.  Introduction:  The  hazard  potential 

Let  T  denote  the  time  to  failure  of  a  unit  that  is  scheduled  to  operate  in  some 
specified  static  environment.  Let  h(t)  be  the  hazard  rate  function  of  the  survival 
function  of  T,  namely,  F(T  >  t),t  >  0.  Let  H(t)  =  JQ  h(u)duy  be  the  cumulative 
hazard  function  at  t;H(t)  is  increasing  in  t.  With  h(t)}t  >  0  specified,  it  is  well 
known  that 

Pr(T  >  t ;  h(t),  t  >  0)  =  exp(-H(t)). 

Consider  now  an  exponentially  distributed  random  variable  X ,  with  scale  parameter 
A,  A  >  0.  Then  for  some  H(t)  >  0, 

Pr(X  >  H(t) \X  =  1)  =  exp(- //(£)); 


thus 

(2.1)  Pr(T  >  £;  h(t)}  t  >  0)  «  exp(-//(t))  =  Pr(X  >  tf(t)|A  =  1). 

The  right  hand  side  of  the  above  equation  says  that  the  item  in  question  will 
fail  when  its  cumulative  hazard  H{t)  crosses  a  threshold  X ,  where  X  has  a  unit 
exponential  distribution.  Singpurwalla  [llj  calls  X  the  Hazard  Potential  of  the 
item,  and  interprets  it  as  an  unknown  resource  that  the  item  is  endowed  with  at 
inception.  Furthermore,  H(t)  is  interpreted  as  the  amount  of  resource  consumed 
at  time  £,  and  h(t)  is  the  rate  at  which  that  resource  gets  consumed.  looking  at 
the  failure  process  in  terms  of  an  endowed  and  a  consumed  resource  enables  us  to 
characterize  an  environment  as  being  normal  when  // (t)  =  t ,  and  as  being  accela 
ated  (decelerated)  when  IJ(t)  >  (<)  t.  More  importantly,  with  X  interpreted  as  an 
unknown  resource,  we  are  able  to  interpret  dependent  lifetimes  as  the  consequence 
of  dependent  hazard  potentials,  the  later  being  a  manifestation  of  commonalities 
of  design,  manufacture,  or  genetic  make-up.  Thus  one  way  to  generate  dependent 
lifetimes,  say  7\  and  is  to  start  with  a  bivariate  distribution  (X\yX2)  whose 
marginal  distributions  are  exponential  with  scale  parameter  one,  and  which  is  not 
the  product  of  exponential  marginals.  The  details  are  in  Singpurwalla  [11]. 

When  the  environment  is  dynamic,  the  rate  at  which  an  item’s  resource  gets 
consumed  is  random.  Thus  h(t)\t  >  0  is  better  described  as  a  stochastic  process, 
and  consequently,  so  is  H(t),t  >  0.  Since  H(t)  is  increasing  in  t,  the  cumulative 
hazard  process  {H(t)\t  >  0}  is  a  continuous  increasing  process,  and  the  item 
fails  when  this  process  hits  a  random  threshold  Xy  the  item’s  hazard  potential. 
Candidate  stochastic  processes  for  (LT(t);  t  >  0}  are  proposed  in  the  reference  given 
above,  and  the  nature  of  the  resulting  lifetimes  described  therein.  Noteworthy  are 
an  increasing  Levy  process,  and  the  maxima  of  a  Wiener  process. 

In  what  follows  we  show  how  the  notion  of  a  hazard  potential  serves  as  a  unifying 
platform  for  describing  the  competing  risk  phenomenon  and  the  phenomenon  of 
failure  due  to  ageing  or  degradation  in  the  presence  of  a  marker  (or  a  bio  marker) 
such  as  crack  size  (or  a  CD4  cell  count). 

3.  Dependent  competing  risks  and  competing  risk  processes 

By  “competing  risks”  one  generally  means  failure  due  to  agents  that  presumably 
compete  with  each  other  for  an  item’s  lifetime.  The  traditional  model  that  has 
been  used  for  describing  the  competing  risk  phenomenon  has  been  the  reliability  of 
a  series  system  whose  component  lifetimes  are  independent  or  dependent.  The  idea 
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here  is  that  since  the  failure  of  any  component  of  the  system  leads  to  the  failure 
of  the  system,  the  system  experiences  multiple  risks,  each  risk  leading  to  failure. 
Thus  if  Ti  denotes  the  lifetime  of  component  i,  i  sr  1, . . . ,  ky  say,  then  the  cause 
of  system  failure  is  that  component  whose  lifetime  is  smallest  of  the  k  lifetimes. 
Consequently,  if  T  denotes  a  system’s  lifetime,  then 

(3.1)  Pr (T>t)  =  P(//i(0  <Xly...yHk(t)<  Xk)y 

where  Xi  is  the  hazard  potential  of  the  i-th  component,  and  Ht(t)  its  cumula¬ 
tive  hazard  (or  the  risk  to  component  i)  at  time  t.  If  the  Xi’s  are  assumed  to  be 
independent  (a  simplifying  assumption),  then  (3.1)  leads  to  the  result  that 

(3.2)  Pr(T  >  t)  *  exp[-(//1(t)  +  •  •  +  Hk(t))]y 

suggesting  an  additivity  of  cumulative  hazard  functions,  or  equivalently,  an  ad¬ 
ditivity  of  the  risks.  Were  the  X{}s  assumed  dependent,  then  the  nature  of  their 
dependence  will  dictate  the  manner  in  which  the  risks  combine.  Thus  for  example 
if  for  some  8y  0  <  8  <  1,  we  suppose  that 

Pr(Afi  >  x\y  X2  >  x2\8)  =  exp(— x\  —  £2  —  0x\x2)y 

namely  one  of  Gumbel’s  bivariate  exponential  distributions,  then 

Pr(7  >  t\8)  =  «xph(Hi(0  +  H2(t)  4-  8H1(t)H2{t))] 

The  cumulative  hazards  (or  equivalently,  the  risks)  are  no  longer  additive. 

The  series  system  model  discussed  above  has  also  been  used  to  describe  the 
failure  of  a  single  item  that  experiences  several  failure  causing  agents  that  compete 
with  each  other.  However,  we  question  this  line  of  reasoning  because  a  single  item 
posseses  only  one  unknown  resource.  Thus  the  X\ , . . . ,  Xk  of  the  series  system  model 
should  be  replaced  by  a  single  X}  where  X\  ~  X2  =  •  •  *  =  Xk  X  (in  probability). 
To  set  the  stage  for  the  single  item  case,  suppose  that  the  item  experiences  k 
agents,  say  Ci , . .  - ,  Cjt ,  where  an  agent  is  seen  as  a  cause  of  failure;  for  example, 
the  consumption  of  fatty  foods.  Let  Ht(t)  be  the  consequence  of  agent  Ci ,  were  Ci 
be  the  only  agent  acting  on  the  item.  Then  under  the  simultaneous  action  by  all  of 
the  k  agents  the  item’s  survival  function 

Pt(T  >  t;  hi(t), . . .  ,hk(t)) 

(3  3)  =P(Ht(t)<X . Hk(t)<X) 

=  exp(—  max(//i(t), . . . , 

Here  again,  the  cumulative  hazards  are  not  additive. 

Taking  a  clue  from  the  fact  that  dependent  hazard  potentials  lead  us  to  a 
non- additivity  of  the  cumulative  hazard  functions,  we  observe  that  the  condition 

p  p  p  p  p 

X\  =  X2  =  *  •  =  Xk  =  X  (where  X\  =  X2  denotes  that  Xiand  X2  are  equal  in 
probability)  implies  that  Xi, . . . , Xk  are  totally  positively  dependent ,  in  the  sense 
of  Lehmann  (1966).  Thus  (3.2)  and  (3.3)  can  be  combined  to  claim  that  in  general, 
under  the  series  system  model  for  competing  risks,  P(T  >  t)  can  be  bounded  as 

k 

(3.4)  exp(—  ^2  tfi(t))  <  P(T  >t)<  exp(—  max(//1(t), . . . ,  Hk(t))). 

1 

Whereas  (3.4)  above  may  be  known,  our  argument  leading  up  to  it  could  be  new. 
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3.1.  Competing  risk  processes 

The  prevailing  view  of  what  constitutes  dependent  competing  risks  entails  a  con¬ 
sideration  of  dependent  component  lifetimes  in  the  scries  system  model  mentioned 
above.  By  contrast,  our  position  on  a  proper  framework  for  describing  dependent 
competing  risks  is  different.  Since  it  is  the  //j(£)’s  that  encapsulate  the  notion 
of  risk,  dependent  competing  risks  should  entail  interdependence  between  //*(£)’ s, 
2  =  1, . . . ,  k.  This  would  require  that  the  //*(£)’ s  be  random,  and  a  way  to  do  so 
is  to  assume  that  each  {Hl(t);t  >  0}  is  a  stochastic  process;  we  call  this  a  com¬ 
peting  risk  process .  The  item  fails  when  any  one  of  the  {//*(£);  £  >  0}  processes 
first  hits  the  items  hazard  potential  X.  To  incorporate  interdependence  between 
the  7/i(£)’s,  we  conceptualize  a  fc-variate  process  [Hi(t)> . . . ,  Hk(t)\  £  >  0},  that  we 
call  a  dependent  competing  risk  process.  Since  //,(£)’ s  are  increasing  in  £,  one 
possible  choice  for  each  {//*(£);  £  >  0}  could  be  a  Brownian  Maximum  Process.  That 
is  Hi(t)  =  sup0<3<t{Wi(s); s  >  0},  where  >  0}  is  a  standard  Brownian 

motion  process.  Dependence  between  the  //j(£)’s  can  be  induced  via  a  dependence 
between  the  {W§(*);5  >  0}  processes.  Thus  for  example,  in  the  bivariate  case,  if 
p  denotes  the  correlation  between  two  standard  Brownian  motion  processes,  then 


Pr(T  >t)=  f 

Jo 


P  (Hi  (£)  <  x,  H2(t)  <  x)  e~xdx 


and  it  can  be  shown  (details  omitted)  that, 

(3.5)  Pr(T  >  t)  = 


fo  fo  exP 

(a2-f62-2pa6) 
2tfl  p2) 

dadb 

rOO  r  OO 

Jo  Jo  exP 

(u2+v2-2puv) 

2f(l-p3) 

dudv 

Another  possibility,  again  for  the  case  of  k  =  2,  is  to  assume  that  { //i  (£);  £  >  0} 
is  some  non- negative,  non  decreasing,  right-continuous  process,  but  that  [H2(t)\ 
t  >  0}  has  a  sample  path  which  is  an  impulse  function  of  the  form  H2(t)  =  0  for  all 
£  <  £*,  and  that  H2(t*)  —  oo  for  some  V  >  0,  where  the  rate  of  occurrence  of  the 
impulse  at  time  t  depends  on  H\(t).  The  process  {H2(t)\  t  >  0}  can  be  identified 
with  some  sort  of  a  traumatic  event  that  competes  with  the  process  {H\(t);t  >  0) 
for  the  lifetime  of  the  item.  In  the  absence  of  trauma  the  item  fails  when  the 
process  {Hi(t)\  t  >  0}  hits  the  item’s  hazard  potential.  This  scenario  parallels  the 
one  considered  by  Lemoine  and  Wenocur  [6],  albeit  in  a  context  that  is  different 
from  ours  By  assuming  that  the  probability  of  occurrence  of  an  impulse  in  the  time 
interval  \t,t  t  h)y  given  that  Hi(t)  =  is  1  —  exp(—  uh),  Lemoine  and  Wenocur 
[6]  have  shown  that  for  X  —  x,  the  probability  of  survival  of  an  item  to  time  t  is  of 
the  form: 


(3.6) 


Pr(T  >  £)  =  F, 


exp  (0) 


where  //i(»)  is  the  indicator  of  a  set  A ,  and  the  expectation  is  with  respect  to  the 
distribution  of  the  process  [H\ (£);£  >  0}.  As  a  special  case,  when  {#)(£)',  £  >  0}  is 
a  gamma  process  (see  Singpurwalla  [10]),  and  x  is  infinite,  so  that  l[oyoo)  (Hi  (£))  =  1 
for  H\  (t)  >  0,  the  above  equation  takes  the  form 


(3.7) 


Pr(T  >t)  =  exp(-(l  +  t)  log(l  +  t)  +  t). 


On  Competing  risk  and  degradation  processes 


233 


The  closed  form  result  of  (3.7)  suffers  from  the  disadvantage  of  having  the  effect  of 
the  hazard  potential  de  facto  nullified.  The  more  realistic  case  of  (3.6)  will  call  for 
numerical  or  simulation  based  approaches.  These  remain  to  be  done;  our  aim  here 
has  been  to  give  some  flavor  of  the  possibilities. 


4.  Biomarkers  and  degradation  processes 

A  topic  of  current  interest  in  both  reliability  and  survival  analysis  pertains  to  assess¬ 
ing  lifetimes  based  on  observable  surrogates,  such  as  crack  length,  and  biomarkers 
like  CD4  cell  counts.  Here  again  the  hazard  potential  provides  a  unified  perspective 
for  looking  at  the  interplay  between  the  unobservable  failure  causing  phenomenon, 
and  an  observable  surrogate.  It  is  an  assumed  dependence  between  the  above  two 
processes  that  makes  this  interplay  possible. 

To  engineers  (cf.  Bogdanoff  and  Kozin  [1])  degradation  is  the  irreversible  accu¬ 
mulation  of  damage  throughout  life  that  leads  to  failure.  The  term  “damage”  is 
not  defined;  however  it  is  claimed  that  damage  manifests  itself  via  surrogates  such 
as  cracks,  corrosion,  measured  wear,  etc.  Similarly,  in  the  biosciences,  the  notion 
of  “ageing”  pertains  to  a  unit’s  position  in  a  state  space  wherein  the  probabilities 
of  failure  are  greater  than  in  a  former  position  Ageing  manifests  itself  in  terms 
of  biomedical  and  physical  difficulties  experienced  by  individuals  and  other  such 
biomarkers 

With  the  above  as  background,  our  proposal  here  is  to  conceptualize  ageing  and 
degradation  as  unobservable  constructs  (or  latent  variables)  that  serve  to  describe 
a  process  that  results  in  failure.  These  constructs  can  be  seen  as  the  cause  of  ob¬ 
servable  surrogates  like  cracks,  corrosion,  and  biomarkers  such  as  CD4  cell  counts. 
This  modelling  viewpoint  is  not  in  keeping  with  the  work  on  degradation  modelling 
by  Doksum  [3j  and  the  several  references  therein.  The  prevailing  view  is  that  degra¬ 
dation  is  an  observable  phenomenon  that  reveals  itself  in  the  guise  of  crack  length 
and  CD4  cell  counts.  The  item  fails  when  the  observable  phenomenon  hits  some 
threshold  whose  nature  is  not  specified.  Whereas  this  may  be  meaningful  in  some 
cases,  a  more  general  view  is  to  separate  the  observable  and  the  unobservable  and 
to  attribute  failure  as  a  consequence  of  the  behavior  of  the  unobservable. 

To  mathematically  describe  the  cause  and  effect  phenomenon  of  degradation  (or 
ageing)  and  the  observables  that  it  spawns,  we  view  the  (unobservable)  cumula¬ 
tive  hazard  function  as  degradation,  or  ageing,  and  the  biomarker  as  an  observ¬ 
able  process  that  is  influenced  by  the  former.  The  item  fails  when  the  cumulative 
hazard  function  hits  the  itenTs  hazard  potential  X7  where  X  has  exponential  (1) 
distribution.  With  the  above  in  mind  we  introduce  the  degradation  process  as 
a  bivariate  stochastic  process  {H(t)t  Z(t),  t  >  0},  with  H(t)  representing  the  un¬ 
observable  degradation,  and  Z(t)  an  observable  marker.  Whereas  //(£)  is  required 
to  be  non  decreasing,  there  is  no  such  requirement  on  Z(t).  For  the  marker  to  be 
useful  as  a  predictor  of  failure,  it  is  necessary  that  H{t)  and  Z(t)  be  related  to  each 
other.  One  way  to  achieve  this  linkage  is  via  a  Markov  Additive  Process  (cf.  Cinlar 
[2])  wherein  {Z(t)\t  >  0}  is  a  Markov  process  and  {H{t)\t  >  0}  is  an  increasing 
Levy  process  whose  parameters  depend  on  the  state  of  the  {Z(t);t  >  0}  process. 
The  ramifications  of  this  set-up  need  to  be  explored. 

Another  possibility,  and  one  that  we  are  able  to  develop  here  in  some  detail  (see 
Section  5),  is  to  describe  { Z(t );  t  >  0}  by  a  Wiener  process  (cracks  do  heal  and  CD4 
cell  counts  do  fluctuate),  and  the  unobservable  degradation  process  {#(0;  *  >  0} 
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by  a  Wiener  Maximum  Process ,  namely, 

(4.1)  H(t)  —  sup  {Z(s);  s  >  0}. 

0<3<t 

What  makes  the  topic  of  analyzing  degradation  processes  attractive  is  not  just 
the  modeling  part;  the  statistical  and  computational  issues  that  the  set-up  creates 
are  quite  challenging.  Since  {Z(t)\t>  0}  is  an  observable  process,  how  may  one  use 
observations  on  this  process  until  some  time,  say  £*,  to  make  inferences  about  the 
process  of  interest  H(t)y  for  t  >  t *?  In  other  words,  how  does  one  assess  Pr (T  > 
£|{Z(s);0  <  s  <  i*  <  £}) ,  where  T  is  an  item’s  time  to  failure?  Furthermore,  as  is 
often  the  case,  the  process  {Z(s)\  s  >  0}  cannot  be  monitored  continuously.  Rather, 
what  one  is  able  to  do  is  observe  {Z(s)\s  >  0}  at  k  discrete  time  points  and  use 
these  as  a  basis  for  inference  about  Pr(T  >  t\{Z(s)]0  <  s  <  t*  <  t}).  These  and 
other  matters  are  discussed  next  in  Section  5,  which  could  be  viewed  as  a  prototype 
of  what  else  is  possible  using  other  models  for  degradation. 


5.  Inference  under  a  Wiener  maximum  process  for  degradation 

We  start  with  some  preliminaries  about  a  Wiener  process  and  its  hitting  time  to  a 
threshold.  The  notation  used  here  is  adopted  from  Doksiim  [3  . 


5.1.  Hitting  time  of  a  Wiener  maximum  process  to  a  random  threshold 

Let  Zt  denote  an  observable  marker  process  {Z{t)\t  >  0},  and  IIt  an  unobservable 
degradation  process  {//(£);£  >  0}.  The  relationship  between  these  two  processes 
is  prescribed  by  (4.1).  Suppose  that  Zt  is  described  by  a  Wiener  process  with  a 
drift  parameter  77  and  a  diffusion  parameter  cr 2  >  0.  That  is,  Z(0)  =  0  and  Zt  has 
independent  increments.  Also,  for  any  t  >  0,  Z(t)  has  a  Gaussian  distribution  with 
E(Z(t))  =  r)t ,  and  for  any  0  <  t\  <  £2,  Var[Z(i2)  —  Z(t\)]  =  (£2  —  h)cr2.  Let  Tx 
denote  the  first  time  at  which  Zt  crosses  a  threshold  x  >  0;  that  is,  Tx  is  the  hitting 
time  of  Zt  to  x.  Then,  when  rj  =  0, 

Pr  (Z(t)  >  x)  =  Pr(£(t)  >  x\Tx  <  £)Pr(Tx  <  t) 
j  +  Pr  (Z(l)  >  x\Tx  >  t)  Pr  (Tx  >  t) , 

(5.2)  Pr  (Tx  <  t)  =  2Pr (Z(t)>x). 

This  is  because  Pr (Z(t)  >  x\Tx  <  t)  can  be  set  to  1/2,  and  the  second  term  on 
the  right  hand  side  of  (5.1)  is  zero.  When  Z(t)  has  a  Gaussian  distribution  with 
mean  r)t  and  variance  cr2i,  Pr (Z(t)  >  x)  can  be  similarly  obtained,  and  thence 

Pt(Tx  <  £)  d=  Fx(t\T}yo).  Specifically  it  can  be  seen  that 

(5.3)  «%»).♦  ^)«p(“). 

where  p  =  xjr\  and  X  =  x2/a2.  The  distribution  Fx is  the  Inverse  Gaussian 
Distribution  (/G-Distribution)  with  parameters  p  and  A,  where  p  —  E(TX)  and 
Ap2  =Var(Tx).  Observe  that  when  77  =  0,  both  E(TX)  and  Var(Tc)  are  infinite,  and 
thus  for  any  meaningful  description  of  a  marker  process  via  a  Wiener  process,  the 
drift  parameter  77  needs  to  be  greater  than  zero. 
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The  probability  density  of  Fx  at  t  takes  the  form: 

(5-4)  j)  =  ^2^3exP 

for  A  >  0. 

We  now  turn  attention  to  Ht,  the  process  of  interest  We  first  note  that  because 
of  (4.1),  11(0)  =  0,  and  II (t)  is  non-decreasing  in  t\  this  is  what  was  required  of 
Ht-  An  item  experiencing  the  process  Ht  fails  when  Ht  first  crosses  a  threshold 
X,  where  X  is  unknown.  However,  our  uncertainty  about  X  is  described  by  an 
exponential  distribution  with  probability  density  f(x)  =  e~x.  Let  T  denote  the 
time  to  failure  of  the  item  in  question.  Then,  following  the  line  of  reasoning  leading 
to  (5.1),  we  would  have,  in  the  case  of  g  =  0, 

Pr(T<  t)  =  2Pr(H(t)>x). 

Furthermore,  because  of  (4.1),  the  hitting  time  of  Ht  to  a  random  threshold  X  will 
coincide  with  Tx,  the  hitting  time  of  Zt  (with  r\  >  0)  to  X.  Consequently, 


A 

2  (i2  t 


Pr  (T<t) 


Pr (Tx  <  t)  =  [ 
Jo 

f° ° 

/  Pr (Tx  <  t)e 
Jo 


Pr (Tx  <  t\X  =  x)f(x)dx 

rdx=  [  Fx(t\rucr)e~xdx. 

Jo 


Rewriting  Fx(t\r),  a)  in  terms  of  the  marker  process  parameters  r\  and  a,  and  treating 
these  parameters  as  known,  we  have 


rr(T<t\V,o)  =}  F{t\n,<x) 


as  our  assessment  of  an  item’s  time  to  failure  with  rj  and  a  assumed  known.  It  is 
convenient  to  summarize  the  above  development  as  follows 

Theorem  5.1.  The  time  to  failure  T  of  an  item  experiencing  failure  due  to  ageing 
or  degradation  described  by  a  Wiener  Maximum  Process  with  a  drift  parameter 
T)  >  0,  and  a  diffusion  parameter  a2  >  0,  has  the  distribution  function  F(t\r},a) 
which  is  a  location  mixture  of  Inverse  Gaussian  Distributions.  This  distribution 
function ,  which  is  also  the  hitting  time  of  the  process  to  an  exponential  (1)  random 
threshold ,  is  given  by  (5.5). 

In  Figure  1  we  illustrate  the  behavior  of  the  /(7-Distribution  function  Fx(t), 
for  x  —  1,2, 3, 4,  and  5,  when  ij  =  o  =  1,  and  superimpose  on  these  a  plot  of 
F(t\r)  =  a  —  1)  to  show  the  effect  of  averaging  the  threshold  x.  As  can  be  expected, 
averaging  makes  the  S  shapedness  of  the  distribution  functions  less  pronounced. 


5.2.  Assessing  lifetimes  using  surrogate  (biomarker)  data 

The  material  leading  up  to  Theorem  5.1  is  based  on  the  thesis  that  77  and  a2  are 
known.  In  actuality,  they  are  of  course  unknown.  Thus,  besides  the  hazard  potential 
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0  4  8  12 

Time  to  Faiure 

FlG  1.  The  IG~  Distribution  unth  thresholds  x  —  1, ...  ,5  and  the  avercuqed  IG- Distribution. 


X,  the  7}  and  a1  constitute  the  unknowns  in  our  set-up.  I’o  assess  7}  and  a2  we  may 
use  prior  information,  and  when  available,  data  on  the  underlying  processes  Zt  and 
Ht ■  The  prior  on  X  is  an  exponential  distribution  with  scale  one,  and  this  prior 
can  also  be  updated  using  process  data.  In  the  remainder  of  this  section,  we  focus 
attention  on  the  case  of  a  single  item  and  describe  the  nature  of  the  data  that  can 
be  collected  on  it.  Wc  then  outline  an  overall  plan  for  incorporating  these  data  into 
our  analyses. 

In  Section  5.3  we  give  details  about  the  inferential  steps.  The  scenario  of  observ 
ing  several  items  to  failure  in  order  to  predict  the  lifetime  of  a  future  item  will  not 
be  discussed. 

In  principle,  we  have  assumed  that  Ht  is  an  unobservable  process.  This  is  cer¬ 
tainly  true  in  our  particular  case  when  the  observable  marker  process  Zt  cannot  be 
continuously  monitored.  Thus  it  is  not  possible  to  collect  data  on  Ht .  Contrast  our 
scenario  to  that  of  Doksum  [3],  Lu  and  Meeker  [7],  and  Lu,  Meeker  and  Escobar 
[8],  who  assume  that  degradation  is  an  observable  process  and  who  use  data  on 
degradation  to  predict  an  item’s  lifetime.  We  assume  that  it  is  the  surrogate  (or 
the  biomarker)  process  Zt  that  is  observable,  but  only  prior  to  T,  the  item’s  failure 
time.  In  some  cases  wc  may  be  able  to  observe  Zt  at  t=T,  but  doing  so  in  the  case 
of  a  single  item  would  be  futile,  since  our  aim  is  to  assess  an  unobserved  T.  Data 
on  Zt  will  certainly  provide  information  about  rj  and  <x2,  but  also  about  X\  this  is 
because  for  any  t  <  T,  we  know  that  X  >  Z(t).  Thus,  as  claimed  by  Nair  [9],  data 
on  (the  observable  surrogates  of)  degradation  helps  sharpen  lifetime  assessments, 
because  a  knowledge  of  77,  cr2  and  X  translates  to  a  knowledge  of  T. 

It  is  often  the  case  at  least  we  assume  so  -  that  Zt  cannot  be  continuously 
monitored,  so  that  observations  on  Zt  could  be  had  only  at  times  0  <  £1  <  t.2  <  -  •  < 
tk  <  T1  yielding  Z  =  (Z($i),  . . . ,  Z(t\ t))  as  data.  Furthermore,  based  on  Z(tk))  we 
are  able  to  assert  that  X  >  Z(tk).  This  means  that  our  updated  uncertainty  about 
X  will  be  encapsulated  by  a  shifted  exponential  distribution  with  scale  parameter 
one,  and  a  location  (or  shift)  parameter  Z(tk). 

Thus  for  an  item  experiencing  failure  due  to  degradation,  whose  marker  process 
yields  Z  as  data,  our  aim  will  be  to  assess  the  item’s  residual  life  (T  —  t *).  That  is, 
for  any  u  >  0,  we  need  to  know  Pr(T  >  tk  4-  u;  Z)  =  Pr(T  >  tk  +  u;T  >  £&),  and 
this  under  a  certain  assumption  (cf.  Singpurwalla  [12])  is  tantamount  to  knowing 

Pr(T  >  tk  +  u) 

Pr(T>tk)  ’ 


(5.6) 
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for  0  <  u  <  oo.  To  assess  the  two  quantities  in  the  above  ratio,  we  need  to  consider 
the  quantity  Pr(T  >  t\ Z),  for  some  t  >  0.  Let  tc(t},  cr2,  x\  Z)  encapsulate  our  uncer¬ 
tainty  about  77,  a2  and  X  in  the  light  of  the  data  Z.  In  Section  5.3  we  describe  our 
approach  for  assessing  77(77,  cr2,  x;  Z).  Now 

(5.7)  Pr  (T  >  t ;  Z)  =•  f  Pr (T  >  t\rt h  a2,  x;  Z)tt(77,  a2,  x\  Z){dri){da2)(dx) 

Jr}t<T7,x 

=  [  Pr(Tx  >  t\T],a2)ir(r),o2,x;Z)(di})(do2)(dx) 

Jr}t<j2,x 

(5.8)  =  f  Fx{t\i), c)n(Ti,  ff2,x,  Zi)(d.T])(d<T2)(dx), 

J  7},(T0  tX 

where  Fx{t\rf,  a)  is  the  /G-Distribution  of  (5.3). 

Implicit  to  going  from  (5.7)  to  (5.8)  is  the  assumption  that  the  event  (T  >  t) 
is  independent  of  Z  given  77,  a 2  and  X.  In  Section  5.3  we  will  propose  that  77  be 
allowed  to  vary  between  a  and  b\  also,  a2  >  0,  and  having  observed  Z(tjt),  it  is  clear 
that  x  must  be  greater  than  Z(tk).  Consequently,  (5.8)  gets  written  as 

rb  fOO  r  OO 

(5.9)  Pr(T  >  t\Z)  =  /  /  /  Fx{t\q,  a)n(r),  ct2,  x\  Z){d.J))(da2)(dx), 

Ja  JO  JZ(tK.) 

and  the  above  can  be  used  to  obtain  Pr(T  >  tk  +  u;  Z)  and  Pr(T  >  tk\  Z).  Once 
these  are  obtained,  we  are  able  to  assess  the  residual  life  Pr(T  >  tk  4-  u\T  >  tfc), 
for  u  >  0 

We  now  turn  our  attention  to  describing  a  Bayesian  approach  specifying  7r(77,a2, 
x;  Z). 

5.3.  Assessing  the  posterior  distribution  0/77,  cr2  and  X 

The  purpose  of  this  section  is  to  describe  an  approach  for  assessing  77(77,  cr2,x;  Z), 
the  posterior  distribution  of  the  unknowns  in  our  set  up.  For  this,  we  start  by 
supposing  that  Z  is  an  unknown  and  consider  the  quantity  77(77,  a2,  x|  Z).  This  is 
done  to  legitimize  the  ensuing  simplifications.  By  the  multiplication  rule,  and  using 
obvious  notation 

n(r,,a2,x\Z)  =  nfo,  a2\X,  Z)n2(X\Z). 

It  makes  sense  to  suppose  that  77  and  a2  do  not  depend  on  X ;  thus 

(5.10)  it(t],o2,x\Z)  =  mill,  <t2\Z)tt2(X\Z). 

However,  Z  is  an  observed  quantity.  Thus  (5.10)  needs  to  be  recast  as: 

(5.11)  n(r),  a2,x;  Z)  =  7Ti <»7,  cr2;  Z)if2(X;  Z). 

Regarding  the  quantity  7T2(X;Z),  the  only  information  that  Z  provides  about 
X  is  that  X  >  Z(tk).  Thus  7r2(X;Z)  becomes  7r2(X;  Z(tk)).  We  may  now  invoke 
Bayes’  law  on  7r2(X,  Z{tk))  and  using  the  facts  that  the  prior  on  X  is  an  exponential 
(1)  distribution  on  (0,oo),  obtain  the  result  that  the  posterior  of  X  is  also  an 
exponential  (1)  distribution,  but  on  (Z(tfc),  00).  That  is,  7 r2(X;Z(t*))  is  a  shifted 
exponential  distribution  of  the  form  exp(— (x  —  Z{tk ))),  for  x  >  Z{tk). 

Turning  attention  to  the  quantity  7T]  (77,  cr2;  Z)  we  note,  invoking  Bayes1  law,  that 

* i(7j,a2;Z)  a  C{-q,a2-Z)-it'(T),a2), 


(5.12) 
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where  £(77,  <72;  Z)  is  the  likelihood  of  77  and  <7 2  with  Z  fixed,  and  tt* (77, <r2)  our  prior 
on  77  and  a2.  In  what  follows  we  discuss  the  nature  of  the  likelihood  and  the  prior. 

The  Likelihood  of  77  and  <j2 

Let  Yl  =  Zfa),  Y2  =  (Z(t2)  Z{h)),...,Yk  =  {Z(tk)  -  Z(tk-i)),  and  sx  = 

hi  S2  =  fa  —  t\y  •  •  • » Sk  =  tk  —  fjfc-i-  Because  the  Wiener  process  has  independent 
increments,  the  yx-’s  are  independent.  Also,  y*  ^  N(77St,<72st),  i  =  1,...,/:,  where 
N(/z,f2)  denotes  a  Gaussian  distribution  with  mean  fj.  and  variance  £2.  Thus,  the 
joint  density  of  the  yS s,  i  =  1, . . . ,  A;,  which  is  useful  for  writing  out  a  likelihood  of 
77  and  a2,  will  be  of  the  form 


where  <j>  denotes  a  standard  Gaussian  probability  density  function  As  a  consequence 
of  the  above,  the  likelihood  of  77  and  a1  with  y  =  (7/1, . . .  ,yk)  fixed,  can  be  written 
as: 


(5.13) 


£(*7,<r2;y)  =  n 

1=1 


exp 


1  (  yi  -  ysj  \ 2 

2  V  ) 


The  Prior  on  77  and  <72 

Turning  attention  to  7:*  (77,  a2),  the  prior  on  77  and  cr2,  it  seems  reasonable  to  suppose 
that  77  and  a2  are  not  independent  It  makes  sense  to  suppose  that  the  fluctuations 
of  Zt  depend  on  the  trend  77  The  larger  the  77,  the  bigger  the  cr2,  so  long  as  there  is 
a  constraint  on  the  value  of  r/  If  77  is  not  constrained  the  marker  will  take  negative 
values.  Thus,  we  need  to  consider,  in  obvious  notation 

(5.14)  7r"(77,cr2)  =  7T*  (cr2  Jt7)7t’  (77). 

Since  77  can  take  values  in  (0, 00),  and  since  77  —  tan#  -  see  Figure  2-0  must 
take  values  in  (0,tt/2). 

To  impose  a  constraint  on  77,  we  may  suppose  that  0  has  a  translated  beta  density 
on  (a,  6),  where  0  <  a  <  b  <  tt/2.  That  is,  0  —  a  +  (6  —  a)W ,  where  W  has  a  beta 
distribution  on  (0, 1).  For  example,  a  could  be  tt/8  and  b  could  be  3t r/8.  Note  that 
were  0  assumed  to  be  uniform  over  (0, 7r/2),  then  77  will  have  a  density  of  the  form 
2/(tt(1  T  7/2)]  -  which  is  a  folded  Cauchy 


Fig  2,  Relationship  between  Zt  and  77. 
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The  choice  of  TtM{a2\r])  is  trickier.  The  usual  approach  in  such  situations  is  to 
opt  for  natural  conjugacy.  Accordingly,  we  suppose  that  ip  d=  a1  has  the  prior 

(5.15)  7r*(^|7?)  oc  exp  , 


where  v  is  a  parameter  of  the  prior. 

Note  that  E(Tp\rj,i/)  =  77/(1;  —  2),  and  so  ip  =  a2  increases  with  77,  and  77  is 
constrained  over  a  and  6.  Thus  a  constraint  on  a2  as  well. 

To  pin  down  the  parameter  !/,  we  anchor  on  time  t  =  1,  and  note  that  since 
E(Zi)  =  77  and  Var(Zi)  =  a2  =  tpi  a  should  be  such  that  A  a  should  not  exceed 
77  for  some  A  =  1,2,3,. ..;  otherwise  Z\  will  become  negative.  With  A  =  3,  77  = 
3cr  and  so  1 p  =  a2  =  ^2/9.  Thus  u  should  be  such  that  E(a2\rj,  v)  ^  77s/ 9.  But 
E(cx2\T}yU)  =  r}/(u  -  2),  and  therefore  by  setting  77/(1/  -  2)  =  772/9,  we  would  have 

1/  =  9/77  -i-  2.  In  general,  were  we  to  set  77  =  A  a,  1/  =  A2/t7  H-  2,  for  A  —  1, 2, _ 

Consequently,  v/2  +  1  =  (A2/?7  -f  2)/2  +  1  =  A2/2?7  -f  2,  and  thus 


(5.16) 


would  be  our  prior  of  a2,  conditioned  on  Tpt  and  A  =  1,2,...,  serving  as  a  prior 
parameter.  Values  of  A  can  be  used  to  explore  sensitivity  to  the  prior. 

This  completes  our  discussion  on  choosing  priors  for  the  parameters  of  a  Wiener 
process  model  for  Z%.  All  the  necessary  ingredients  for  implementing  (5.9)  are  now 
at  hand.  This  will  have  to  be  done  numerically,  it  does  not  appear  to  pose  major 
obstacles.  We  are  currently  working  on  this  matter  using  both  simulated  and  real 
data. 

6.  Conclusion 

Our  aim  here  was  to  describe  how  Lehmann’s  original  ideas  on  (positive)  depen 
deuce  framed  in  the  context  of  nomparametrics  have  been  germane  to  reliability 
and  survival  analysis,  and  even  so  in  the  context  of  survival  dynamics.  The  notion 
of  a  hazard  potential  has  been  the  “hook”  via  which  we  can  attribute  the  cause 
of  dependence,  and  also  to  develop  a  framework  for  an  appreciation  of  competing 
risks  and  degradation.  The  hazard  potential  provides  a  platform  through  which  the 
above  can  be  discussed  111  a  umfied  manner.  Our  platform  pertains  to  the  hitting 
times  of  stochastic  processes  to  a  random  threshold.  With  degradation  modeling, 
the  unobservable  cumulative  hazard  function  is  seen  as  the  metric  of  degradation 
■  (as  opposed  to  an  observable,  like  crack  growth)  and  when  modeling  competing 
risks,  the  cumulative  hazard  is  interpreted  as  a  risk.  Our  goal  here  was  not  to  solve 
any  definitive  problem  with  real  data;  rather,  it  was  to  propose  a  way  of  looking  at 
two  commonly  encountered  problems  in  reliability  and  survival  analysis,  problems 
that  have  been  well  discussed,  but  which  have  not  as  yet  been  recognized  as  having 
a  common  framework.  The  material  of  Section  5  is  purely  illustrative;  it  shows  what 
is  possible  when  one  has  access  to  real  data.  We  are  currently  persuing  the  details 
underlying  the  several  avenues  and  possibilities  that  have  been  outlined  here. 
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Foreword 


The  entries  in  this  volume  have  been  categorized  into  seven  parts,  each  part 
emphasizing  a  theme  that  in  our  judgment  seems  poised  for  the  future  develop¬ 
ment  of  reliability  as  an  academic  discipline  with  relevance.  The  seven  parts: 
are  Networks  and  Systems ;  Recurrent  Events] ;  Information  and  Design ;  The 
Failure  Rate  Function  and  Bum-in\  Software  Reliability  and  Random  Environ¬ 
ments ;  Reliability  in  Composites  and  Orthopedics ,  and  Reliability  in  Finance 
and  Forensics.  Embedded  within  the  above  are  some  of  the  other  currently 
active  topics  such  as  causality,  cascading,  exchangeability,  expert  testimony, 
hierarchical  modeling,  optimization  and  survival  analysis.  Collectively,  these 
when  linked  with  utility  theory  constitute  the  science  base  of  risk  analysis. 

Part  I  on  Networks  and  Systems  consists  of  three  entries  each  striking  a 
unique  and  different  chord.  Boland  and  Samaniego  introduce  the  notion  of 
the  "signature"  of  a  system.  The  term  signature  (or  imprint),  resonates  well 
with  engineering  wherein  it  is  used  to  describe  the  characteristics  of  rotating 
machinery  vis  a  vis  its  vibration.  Boland  and  Samaniego  use  their  notion  to 
characterize  the  manner  in  which  a  system  is  put  together,  irrespective  of  the 
inherent  quality  of  each  member  of  the  system.  They  make  connections  be¬ 
tween  their  notion  and  the  notions  used  in  computer  science.  Their  treatment 
of  the  topic  is  exhaustive;  it  promises  to  generate  added  interest  in  the  notion 
of  signatures.  The  second  paper  by  Kuo  and  Prasad  is  in  some  sense  unique 
among  all  other  entries  because  it  brings  into  the  picture  the  role  of  optimization 
in  reliability.  Since  mathematical  optimization  is  a  core  discipline  of  operations 
research,  Kuo  and  Prasad's  entry  is  noteworthy  on  two  counts.  It  exposes  reli¬ 
ability  theorists  to  the  relevance  of  optimization  in  system  design,  and  it  makes 
this  volume’s  inclusion  in  a  series  in  Operations  Research  and  Management 
Science  germane.  The  third  paper  by  Swift  summarizes  some  of  the  more  re¬ 
cent  work  in  assessing  the  reliability  of  systems  from  a  statistical  point  of  view. 
Such  work,  motivated  by  the  more  recent  concerns  of  infrastructure  protection, 
entails  aspects  of  hierarchical  modeling,  computations  via  the  Markov  chain 
Monte  Carlo,  notions  of  interdependence  (causal  and  cascading  failures)  and 
the  use  of  neural  nets  for  reliability  assessment.  To  whet  the  appetite  of  proba¬ 
bility  theorists,  Swift  caps  his  entry  by  including  in  it  the  pitfalls  of  not  paying 
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attention  to  Borel’s  paradox  which  can  arise  naturally  in  the  context  of  system 
reliability  assessment. 

Part  II  on  Recurrent  Events  consists  of  three  entries,  one  emphasizing  an 
engineering  scenario,  another  the  biomedical  scenario  and  the  third  a  matter 
of  foundations.  Arjas  and  Bhattacharjee  demonstrate  the  importance  of  hierar¬ 
chical  modeling  as  a  way  to  borrow  strength  when  dealing  with  heterogeneous 
data.  They  motivate  their  work  by  starting  with  a  real  life  example  involving 
valve  failures  in  a  nuclear  plant  and  analyze  the  ensuing  data  using  the  Markov 
chain  Monte  Carlo  approach  in  a  Bayesian  context.  Their  analyses  show  how 
modem  statistical  techniques  when  coupled  with  sophisticated  computational 
approaches  can  lead  to  useful  practical  insights.  The  second  paper  by  Doksum 
and  James  pertains  to  the  use  of  a  class  of  priors  originally  proposed  by  Dok¬ 
sum.  These  priors  are  called  neutral  to  the  right  and  they  have  gained  popularity 
in  Bayesian  inference.  Doksum  and  James  do  a  Bayesian  analysis  of  Barlow’s 
total  time  on  test  transform  and  we  are  fortunate  to  receive  this  contribution. 
The  third  paper  by  Pena  and  Hollander  is  both  archival  and  state-of-the-art.  The 
authors  introduce  a  general  class  of  models  for  the  treatment  of  recurrent  event 
data  that  arises  in  a  variety  of  contexts:  health  sciences,  engineering,  economics 
and  sociology.  The  models  are  able  to  incorporate  the  effects  of  interventions, 
accumulations  and  concomitance.  The  list  of  references  is  exhaustive  and  the 
material  is  expository  enough  for  any  novice  to  benefit.  It  offers  the  Bayesians 
a  new  window  of  opportunity  for  research  in  an  area  of  investigation  that  is  very 
general. 

Part  III  on  Information  and  Design  consists  of  three  entries  two  of  which  share 
a  common  theme.  The  aim  of  failure  data  analysis,  irrespective  of  whether  the 
data  arises  from  a  designed  life-testing  experiment  or  retrospectively  from  the 
field,  is  to  gain  information  or  knowledge.  The  latter  enables  one  to  make 
meaningful  predictions  about  future  lifetimes.  Discrimination,  entropy,  and 
information  are  the  three  legs  on  which  the  notion  of  ’’quantified  knowledge" 
rests.  In  the  first  entry,  Ebrahimi  and  Soofi  provide  an  authoritative  synopsis  of 
the  above  triage  with  a  focus  on  how  it  relates  to  reliability  and  life-testing.  The 
entry  is  rich  in  examples  and  almost  complete  vis  a  vis  coverage;  an  exception 
is  the  topic  of  how  to  design  experiments  for  extracting  the  maximum  amount 
of  information  that  one  possibly  can.  All  the  same,  Ebrahimi  and  Soofi ’s  entry 
should  motivate  researchers  in  reliability  to  consider  incorporating  information 
theoretic  ideas  in  reliability  analysis;  this  entry  provides  a  valuable  service.  The 
second  entry  by  Nair,  Escobar  and  Hamada  pertains  to  the  design  of  experiments 
for  gathering  performance  data,  with  a  view  towards  enhancing  reliability.  This 
point  of  view,  popularized  by  Taguchi,  advocates  an  active  philosophy  in  the 
sense  that  the  aim  of  reliability  analysis  should  be  to  improve  performance,  not 
to  merely  report  observed  performance  -  the  passive  view.  Notions  of  accel¬ 
erated  testing,  degradation  analysis,  robustness  and  censoring  are  embodied  in 
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the  context  of  the  theme  of  the  entry.  The  third  entry  by  Wilson,  Reese,  Hamada 
and  Martz  has  a  futuristic  motif.  It  pertains  to  the  fusion  of  information  about 
lifetimes  that  arises  from  two  different  sources:  physical  experiments  and  com¬ 
puter  simulations.  The  latter  is  necessitated  by  either  cost  and  time  constraints 
or  by  the  impossibility  of  conducting  physical  tests.  For  example,  the  inability 
to  test  nuclear  weapons  due  to  test  ban  treaties.  The  authors’  aim  is  achieved 
by  three  modem  technologies:  hierarchical  modeling,  Bayesian  pooling  and 
Markov  chain  Monte  Carlo.  The  entry  is  both  state-of-the-art  and  futuristic;  it 
reinforces  the  idea  that  new  research  and  new  paradigms  are  often  driven  by 
new  problems. 

Part  IV  on  the  Failure  Rate  Function  and  Burn-in  consists  of  two  entries  the 
first  being  a  prelude  to  the  second.  The  entry  by  Block  and  Savits  addresses  the 
fundamental  question  upon  whos  e  answer  depends  the  need  for  the  second  entry 
by  Jensen  and  Spizzichino.  The  notion  of  the  failure  rate  function  is  perhaps 
unique  to  reliability  and  survival  analysis.  Indeed  statistical  reliability  can  be 
said  to  owe  its  existence  to  the  notion  of  failure  rate.  Engineers  often  claim  that 
components  and  systems  exhibit  a  failure  rate  function  whose  shape  is  like  that 
of  a  bath-tub.  The  decreasing  form  of  the  failure  rate  function  is  intriguing; 
specifically,  is  the  decrease  of  the  failure  rate  due  to  some  natural  phenomenon 
or  is  it  the  manifestation  of  something  else,  like  a  mixture  (be  it  physical  or 
be  it  psychological).  A  knowledge  of  the  form  of  the  failure  rate  function  is 
useful  for  commissioning  an  item  to  service.  This  is  the  theme  of  the  second 
entry  by  Jensen  and  Spizzichino.  In  the  first  entry,  Block  and  Savits  provide 
an  overview  of  the  various  forms  of  the  failure  rate  function  that  can  occur  due 
to  mixing  -  irrespective  of  what  causes  the  mixture.  The  treatment  of  Block 
and  Savits  tend  to  be  mathematical  (but  not  necessarily  technical);  however, 
their  entry  here  is  expository  and  relaxed.  This  entry  embodies  the  view  that 
the  good  mathematics  of  reliability  theory  should  be  driven  by  a  genuine  need. 
The  second  entry  by  Jensen  and  Spizzichino  exploits  the  kind  of  results  that 
the  first  entry  can  produce,  in  order  to  address  the  question  of  how  much  one 
should  test  an  item  (i.e.  the  notion  of  "burn-in")  prior  to  commissioning  it  for 
use.  This  entry  explores  several  ramifications  of  the  problem  and  the  material 
-  which  tends  to  be  technically  sophisticated  -  embodies  the  notion  of  utilities 
(via  costs)  -  Bayesian  decision  making  under  uncertainty  and  sequential  control 
theory;  aspects  of  Operations  Research  and  Management  Science. 

Part  V  on  Software  Reliability  and  Random  Environments  pertains  to  an  issue 
that  is  currently  important  and  will  continue  to  be  so.  As  systems  become  more 
and  more  software  driven  and  software  dependent,  unreliable  software  is  the 
critical  component  of  a  system.  The  first  entry  by  Chiang  and  Kuo  uses  some 
of  the  notions  and  ideas  that  are  useful  in  reliability,  to  manage  the  software 
development  process.  This  is  noteworthy  on  two  counts:  the  first  is  that  it  has 
often  been  claimed  by  experienced  software  engineers  that  it  is  the  process  that 
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produces  a  piece  of  software  that  ensures  its  reliability  -  not  just  the  innate 
abilities  of  programmers  to  produce  error-free  codes;  the  second  is  that  by 
using  system  reliability  data  to  manage  the  process,  Chiang  and  Kuo  put  into 
practice  Taguchi’s  philosophy  of  reliability  techniques  playing  an  active  role 
for  producing  quality  software.  There  is  a  parallel  between  this  entry  and  that 
of  Jensen  and  Spizzichino  vis  a  vis. optimum  time  to  "bum-in"  and  optimum 
time  to  release  software.  Hopefully,  these  entries  will  provide  some  synergy 
between  the  said  topics.  The  second  entry  by  Ozekici  and  Soyer  pertains  to 
a  generic  topic  in  reliability  -  be  it  hardware  or  be  it  software  -  namely  the 
manner  in  which  the  effects  of  a  random  environment  can  be  treated  in  the 
context  of  assessing  survivability.  The  entry,  albeit  focussed  on  the  context 
of  software,  provides  an  overview  of  the  several  modem  approaches  -  mostly 
based  on  stochastic  process  theory  linked  with  Bayesian  methodology  -  that  are 
used  in  the  context  mentioned  above. 

Parts  VI  and  VII  pertain  to  some  new  and  important  avenues  of  application 
of  reliability,  namely,  composite  materials ,  orthopedics,  finance  an d  forensics. 
Of  these,  finance  and  orthopedics  seem  to  be  most  intriguing,  and  composite 
materials  the  most  crucial.  Lynch  and  Padgett  provide  an  overview  of  the 
recent  work  on  the  strength  of  fibre  bundles  that  they  have  been  doing  over 
the  past  few  years.  With  the  increased  emphasis  on  infrastructure  protection 
and  the  use  of  composite  materials,  this  type  of  research  has  an  added  urgency. 
Their  entry  pulls  together  several  related  topics  (such  as  pooling  failure  data, 
interacting  systems,  Gaussian  and  inverse  Gaussian  processes  and  inferential 
issues)  to  develop  a  coherent  package  that  should  appeal  to  both  engineers  and 
statisticians.  Their  list  of  references  will  support  this  latter  claim.  Wilson 
and  also  Lynn  introduce  a  new  frontier  for  the  application  of  reliability.  The 
former  focuses  on  a  specific  problem  in  orthopedics,  namely  the  life-length  of 
hip  replacements,  and  uses  a  hierarchical  approach  in  the  context  of  a  Bayesian 
analysis  to  assess  lifetimes  of  such  replacements.  He  illustrates  the  validity  of 
his  approach  by  considering  actual  data.  This  scenario  further  attests  to  the 
growing  importance  of  hierarchical  modeling  in  reliability  analysis.  Via  this 
work  we  note  the  importance  of  the  application  of  reliability  theory  to  such 
burgeoning  areas  as  biomedical  engineering.  Another  area  of  importance  for 
the  application  of  reliability  techniques  is  illustrated  in  Lynn’s  entry  which  is 
more  on  the  conceptual  front  than  on  the  practice  front.  He  introduces  notions 
in  fixed  income  instruments  -  like  bonds  -  and  discusses  their  risk  of  default 
(i.e.  failure).  He  then  points  out  opportunities  wherein  notions  of  reliability 
and  risk  could  come  into  play  and  discusses  some  possibilities.  He  then  moves 
to  the  notion  of  "derivatives"  and  again  points  out  scenarios  wherein  there 
could  be  an  interplay  between  reliability  and  finance.  Lynn’s  entry  is  important 
because  it  opens  a  new  window  of  opportunity  for  the  techniques  of  reliability. 
The  final  entry  is  on  warranties.  These  bring  into  play  the  various  notions  of 
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probability  (objective,  logical  and  personal),  utility  and  game  theory,  failure 
models  indexed  by  multiple  scales,  and  forecasting  using  leading  indicators. 
Violations  of  warranty  are  often  the  cause  of  litigation  -  sometimes  in  millions 
of  dollars  -  and  the  role  of  reliability  analyst  as  an  expert  witness  becomes 
central.  Thus  the  label  reliability  in  forensics. 

Despite  the  broad  coverage  that  we  have  endeavored  to  encompass,  we  are 
aware  of  the  fact  that  there  may  be  other  topics  that  should  have  been  included. 
In  excluding  these  we  take  the  blame;  but  then  we  are  also  quite  delighted  with 
what  we  have  included. 

Finally,  we  would  like  to  take  this  opportunity  to  acknowledge  the  several 
years  of  support  provided  by  The  Army  Research  Officer  and  the  Office  of  Naval 
Research,  for  sustaining  our  work  in  reliability  through  the  George  Washington 
University’s  Institute  for  Reliability  and  Risk  Analysis. 
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Abstract 

This  article  is  a  brief  description  of  some  landmark  advances  in  the  mathematics  of 
risk  and  reliability,  starting  with  the  initial  developments  of  probability  theory  in  the 
17th  century  to  the  ascendancy  of  reliability  theory  during  the  last  60  years. 
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1  PREAMBLE 


Writing  the  history  about  any  topic  is  both  challenging  and  demanding.  Demanding  because 
one  needs  to  acquire  a  broad  perspective  about  the  topic,  a  perspective  that  generally  comes 
over  time  and  experience.  The  challenge  of  writing  history  comes  from  the  matter  of  wh,at 
to  include  and  what  to  omit.  There  is  the  social  danger  of  offending  those  readers  who 
feel  that  their  work  should  have  been  mentioned  but  was  not.  But  the  moral  obligation  of 
excluding  the  works  of  those  who  are  no  more  with  us  is  much  greater.  Writers  of  history 
must  therefore  confront  the  challenge  and  draw  a  delicate  line.  This  task  is  made  easier 
with  the  passage  of  time,  because  the  true  impact  of  a  signal  contribution  is  felt  only  after 
time  has  elapsed.  By  contrast,  the  impact  of  work  that  is  incremental  or  marginal  can  be 
judged  immediately.  It  is  with  the  above  in  mind  that  the  history  that  follows  is  crafted. 
The  word  “select15  in  the  title  of  this  contribution  is  deliberate;  it  reflects  the  judgement  of 
the  authors.  Hopefully,  the  delicate  line  mentioned  before,  has  been  drawn  by  us  in  a  just 
and  honourable  manner.  All  the  same,  our  apologies  to  those  who  may  feel  otherwise,  or 
whose  works  we  have  accidentally  overlooked. 

2  INTRODUCTION 

From  a  layperson.’s  point  of  view,  a  viewpoint  that  predates  history,  the  term  “risk”  con¬ 
notes  the  possibility  that  an  undesirable  outcome  will  occur.  However,  the  modern  technical 
meaning  of  the  teirn  risk  is  different.  Here,  risk  is  the  sum  of  the  product  of  the  probabilities 
of  all  possible  outcomes  of  an  action  and  the  utilities  (or  consequences)  of  each  outcome. 
Utilities  are  numerical  values  of  consequences  on  a  zero  to  one  scale.  Indeed,  utilities  are 
probabilities  and  obey  the  rules  of  probability  (Lindley,  1985,  page  56).  They  encapsulate 
one’s  preferences  between  consequences.  Thus  the  notion  of  risk  entails  the  twin  notions  of 
probability  and  utility.  Some  adverse  outcomes  are  caused  by  the  failure  or  the  malfunc- 
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tioning  of  certain  entities,  biological  or  physical  For  such  adverse  outcomes,  the  probability 
of  failure  of  the  entity  in  question  is  known  as  the  entity’s  unreliability,  its  reliability  is  the 
probability  of  non-failure  for  a  specified  period  of  time.  In  the  biomedical  contexts,  wherein 
the  entity  is  a  biological  unit,  the  term  survivability  is  used  instead  of  reliability.  Thus  as¬ 
sessing  reliability  (or  survivability)  is  de  facto  assessing  a  probability,  and  reliability  theory 
pertains  to  the  methods  and  techniques  for  doing  such  assessments.  The  linkage  between 
reliability  and  risk  is  relatively  new  (Singpurwalla,  2006).  It  is  brought  about  by  the  point 
of  view  that  the  main  purpose  of  doing  a  reliability  analysis  is  to  make  sound  decisions 
about  preventing  failure  in  the  face  of  uncertainty.  To  the  best  of  our  knowledge,  the  first 
document  that  articulates  this  position  is  Barlow  et  al.  (1993).  Thus  we  see  that  probability, 
utility,  risk,  reliability  and  decision  making  are  linked,  with  probability  playing  a  central 
role,  indeed  the  role  of  a  germinator.  Our  history  of  risk  and  reliability  must  therefore  start 
with  a  history  of  probability.  Probability  is  a  way  to  quantify  uncertainty.  Its  origins  date 
back  to  16th  century  Europe  and  discussions  about  its  meaning  and  interpretation  continue 
until  the  present  day.  For  a  perspective  on  these,  the  review  articles  by  Kolmogorov  (1969) 
and  Good  (1990)  are  valuable.  The  former  wholeheartedly  subscribes  to  probability  as  an 
objective  chance ,  and  the  latter  makes  the  point  that  probability  and  chance  are  distinct 
concepts.  The  founding  fathers  of  probability  were  not  motivated  by  the  need  to  quantify 
uncertainty;  they  were  more  concerned  with  action  than  with  interpretation.  This  enables 
us  to  divide  the  history  of  probability  into  three  parts:  until  1750,  1750-1900,  and  from 
r  1900.  These  reflect,  in  our  opinion,  three  reasonably  well-defined  periods  of  development 
of  the  mathematics  of  uncertainty  which  we  label:  foundations,  maturation  and  expansion 
of  applicability.  Some  excellent  books  on  the  history  of  probability  are  by  Hald  (1990b, a), 
Stigler  (1990)  and  von  Plato  (1994).  Since  the  history  of  probability  is  the  background  for 
the  history  of  risk  and  reliability,  a  reading  of  these  and  the  exhaustive  references  therein 
should  provide  risk  and  reliability  analysts  a  deeper  appreciation  of  the  foundations  of  their 
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subject. 


3  TO  1750:  THE  FOUNDATIONS  OF  PROBABIL¬ 
ITY 

Insurance  was  the  first  place  where  the  traditional  notion  of  risk  had  to  be  quantified.  Its 
use  can  be  traced  back  4  millenia  to  ancient  China  and  Babylonia,  where  traders  took  on 
the  risks  of  the  caravan  trade  by  taking  out  loans  that  were  repaid  if  the  goods  arrived.  The 
ancient  Greeks  and  Phoenicians  used  marine  insurance,  while  the  Romans  had  a  form  of  life 
insurance  that  paid  for  the  funeral  expenses  of  the  holder.  However  there  is  no  evidence 
that  insurance  was  a  common  practice  and  indeed  it  disappeared  with  the  fall  of  the  Roman 
Empire.  It  took  the  growth  of  towns  and  trade  in  Renaissance  Europe,  where  risks  such 
as  shipwreck,  losses  from  fire  and  even  kidnap  ransom  worried  the  wealthy,  for  insurance 
to  develop  once  again.  But  it  was  the  development  of  probability  in  the  17th  century  that 
finally  saw  the  foundation  for  the  mathematics  of  risk,  and  where  our  brief  history  can 
really  begin.  We  should  mention  first  that  the  mathematisation  of  uncertainty  can  be  traced 
back  to  Gioralimo  Kardano  (1501-1575).  But  it  was  the  short  correspondence  between 
Pierre  de  Fermat  (1608-1672)  and  Blaise  Pascal  (1623-1662)  that  began  the  development 
of  modern  probability  theory.  Their  correspondence  concerned  a  gambling  question  called 
“The  Problem  of  Points”,  which  is  to  determine  the  fair  bet  for  a  game  of  chance  where 
each  player  has  an  equal  chance  of  winning,  and  the  bet  is  won  as  soon  as  either  player 
wins  the  game  a  pro-determined  number  of  times.  The  difficulty  arises  if  the  number  of 
games  to  win  is  different  for  each  player;  Fermat's  and  Pascal’s  correspondence  led  to  a 
solution.  Meanwhile,  a  contemporary  of  both,  Christiaan  Huygens  (1629  1695),  was  one  of 
the  earliest  scientists  to  think  mathematically  about  risk.  He  was  motivated  by  problems  in 
annuities,  which  at  that  time  were  common  means  for  states  and  towns  to  borrow  money. 
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of  100  [’quick  conceptions’] 

there  dies  within  the  first  six  years  36 


The  next  ten  years,  or  Decad  24 

The  second  Decad  15 

The  third  Decad  9 

The  fourth  6 

The  next  4 

The  next  3 

The  next  2 

The  next  1 


[“perhaps  but  one  surviveth  76”] 

Table  1:  Reproduction  of  the  table  that  appears  in  Graunt  (1662). 

Huygens  wrote  up  the  solution  of  Fermat  and  Pascal,  and  is  thus  credited  with  publishing 
the  first  book  on  probability  theory  (Huygens,  1657).  Without  the  benefit  of  Fermat’s  and 
Pascal’s  theory,  Graunt  produced  the  first  mortality  table  by  decade  (Graunt,  1662),  from 
which  he  concluded  that  only  1%  of  the  population  survived  to  76  years.  Table  1  shows 
this  brilliant  if  unsophisticated  effort,  see  Seal  (1980)  for  a  discussion  of  its  use.  Graunt’s 
work  happened  at  the  time  when  property  insurance  as  we  know  it  today  began.  Following 
the  Great  Fire  of  London  in  1666,  which  destroyed  about  13,000  houses,  Nicholas  Barbon 
opened  an  office  to  insure  buildings.  In  1680,  he  established  England’s  first  fire  insurance 
company,  “The  Fire  Office,”  to  insure  brick  and  frame  homes;  this  also  included  the  first 
fire  brigade.  Edmond  Halley  constructed  the  first  proper  mortality  table,  based  on  the 
statistical  laws  of  mortality  and  compound  interest  (Halley,  1693).  The  table  was  corrected 
by  Joseph  Dodson  in  1756  and  made  it  possible  to  scale  the  premium  rate  to  age;  previously 
the  rate  had  been  the  same  for  all  ages.  The  idea  of  a  fair  price  was  linked  to  probability  by 
Jacob  Bernoulli  (1654-1705),  work  that  was  published  posthumously  by  his  nephew  Nicholas 
(Bernoulli,  1713).  This  work  is  important  because  it  was  the  first  substantial  treatment  of 
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probability,  and  contained  the  general  theory  of  permutations  and  combinations,  the  weak 
law  of  large  numbers  as  well  as  the  binomial  theorem.  What  interested  Bernoulli  was  to 
apply  the  Fermat-Pascal  idea  of  a  fair  bet  to  other  problems  where  the  idea  of  probability 
had  meaning.  He  argued  that  opinions  about  any  event  occurring  or  not  were  analogous 
to  a  game  of  chance  where  betting  on  a  certain  outcome  led  to  a  fair  bet.  The  fair  bet 
then  represents  the  certainty  that  one  attaches  to  an  event  occurring.  This  analogy  between 
games  of  chance  and  one's  opinions  also  appears  to  have  been  made  at  the  time  of  Fermat 
and  Pascal  (Arnauld  and  Nicole,  1662).  The  law  of  large  numbers  was  particularly  important 
for  this  argument  because  Bernoulli  realized  that,  in  practical  problems,  fair  prices  could  not 
be  deduced  exactly  and  approximations  would  have  to  be  found.  This  allowed  him  to  justify 
approximating  the  probability  of  an  event  by  its  relative  frequency.  Thus  in  Bernoulli’s 
ideas  we  see  parts  of  the  two  currently  dominant  interpretations  of  probability:  subjective 
degree  of  belief  and  relative  frequency.  The  relative  frequency  idea  was  further  developed 
by  de  Moivre  (1718),  who  proposed  the  ideas  of  independent  events,  the  summation  rule, 
the  multiplication  rule  and  the  central  limit  theorem.  This  connection  between  fair  prices 
and  probability  is  the  basis  for  insurance  pricing.  Bernoulli’s  and  de  Moivre’s  work  came 
during  a  period  of  rapid  development  of  the  insurance  market,  spurred  on  by  the  growth  of 
maritime  commerce  in  the  17th  and  18th  centuries.  We  have  seen  that  fire  insurance  had 
been  available  since  the  Great  Fire  of  London,  but  up  to  the  18th  century,  most  insurance 
was  underwritten  by  individual  investors  who  stated  how  much  of  the  loss  risk  they  were 
prepared  to  accept.  This  concept  continues  to  this  day  in  Lloyd’s  of  London,  beginning 
in  Edward  Lloyd’s  coffeehouse  around  1688  in  Tower  Street,  London,  which  was  a  popular 
meeting  place  for  the  shipping  community  to  discuss  insurance  deals  among  themselves. 
Soon  after  the  publication  of  Bernoulli’s  work,  corporations  began  to  engage  in  insurance. 
They  were  first  chartered  in  England  in  1720,  and  in  1735,  the  first  insurance  company  in 
the  American  colonies  was  founded  at  Charleston,  S.C.  So,  by  1750  all  the  basic  ideas  of 
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probability  necessary  for  quantifying  risk  —  probability  distributions,  expected  values  and 
the  idea  of  fair  price,  and  mortality  —  were  in  place,  and  were  in  use  in  insurance. 

4  1750-1900:  PROBABILITY  MATURES 

Post  1750,  the  first  notable  name  is  that  of  Thomas  Bayes  (1702-1761)  and  his  famous 
essay  on  inverse  probability  (Bayes,  1764).  His  main  contribution  was  to  articulate  on  the 
multiplication  rule  that  allows  conditional  probabilities. to  be  computed  from  unconditional 
ones;  vitally,  this  permitted  Laplace  (1749-1827)  to  derive  the  law  of  total  probability  and 
Bayes’  law.  In  contrast  to  de  Moivre,  Laplace  thought  that  probability  was  a  rational  belief 
and  the  rules  of  probability  and  expectation  followed  naturally  from  this  interpretation 
(Laplace,  1812,  1814).  Poisson  (1741  1840)  did  much  work  on  the  technical  and  practical 
aspects  of  probability,  and  greatly  expanded  the  scope  and  applications  of  probability.  His 
main  contribution  was  a  generalization  of  Bernoulli’s  theorem;  his  seminal  work  (Poisson, 
1837)  also  introduced  the  Poisson  distribution.  While  Poisson  agreed  with  Laplace’s  rational 
belief  interpretation  of  probability,  criticisms  of  this  view  were  raised  as  we  move  to  the 
second  half  of  the  19th  century.  John  Venn  (1834-1923)  revived  the  frequency  interpretation 
of  probability,  hinted  at  by  Bernoulli,  but  taken  further  to  state  that  frequency  was  the 
starting  point  for  defining  probability  (Venn,  1866).  We  note  little  attempt  so  far  to  quantify 
the  consequences  of  adverse  events  through  utility  and  hence  to  manage  risks  in  a  coherent 
manner.  However,  we  note  two  developments.  First,  the  idea  of  utility  did  arise  through 
Daniel  Bernoulli  in  1738  and  utilitarian  philosophers  such  as  Bentham  (1748-1832).  They 
proposed  rules  of  rationality  that  stated  individuals  desire  things  that  maximise  their  utility, 
where  positive  utility  is  defined  as  the  tendency  to  bring  pleasure,  and  negative  utility  is 
defined  as  the  tendency  to  bring  pain  (Bentham,  1781).  Second,  the  industrial  revolution 
meant  that  manufacturing  and  transport  carried  far  graver  risks  than  before,  and  we  do 
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see  the  first  attempts  at  risk  management  through  regulation.  In  the  United  Kingdom, 
the  Factory  Act  of  1802  (known  as  the  “Health  and  Morals  of  Apprentices  Act”)  started 
a  sequence  of  such  acts  that  attempted  to  improve  health  and  safety  at  work.  Following  a 
rail  accident  that  killed  88  people  in  Armagh,  Northern  Ireland,  the  Regulation  of  Railways 
Act  1889  made  fail-safe  brakes  mandatory,  as  well  as  block  signalling.  All  the  main  areas 
of  insurance  —  life,  marine  and  fire  insurance  —  continued  to  grow  throughout  this  period,. 
After  1840,  with  the  decline  of  religious  prejudice  against  the  practice,  life  insurance  entered 
a  boom  period  in  the  United  States.  Many  friendly  or  benefit  societies  were  founded  to 
insure  the  life  and  health  of  their  members.  The  close  of  the  19th  century  finally  allows  us  to 
say  something  about  mathematical  reliability  theory;  Pearson  (1895)  names  the  exponential 
distribution  -for  the  first  time. 


5  FROM  1900  TO  THE  PRESENT:  UTILITY  AND 
RELIABILITY  ENTER 

The  first  half  of  the  twentieth  century  saw  the  beginning  of  the  modern  era  of  probability; 
Kolmogorov  (1903-1987)  axiomized  probability  and  in  doing  so  freed  it  from  the  confu¬ 
sions  of  interpretation  (Kolmogorov,  1956).  It  also  saw  many  developments  in  the  frequency 
interpretation  of  probability,  and  several  advances  in  subjective  probability.  Von  Mises 
(1883-1953)  wrote  a  paper  extolling  the  virtues  of  the  frequentist  interpretation  of  proba¬ 
bility  (von  Mises,  1919),  Together  with  the  work  of  Karl  Pearson  (1857-1936)  and  Fisher 
(1890-1962),  methods  of  inference  under  the  frequency  interpretation  of  probability  became 
the  dominant  approaches  to  data  analysis  and  prediction.  However,  at  about  the  same  time 
there  were  breakthrough  developments  in  the  subjective  approach  to  statistical  inference  and 
decision  making.  Noteworthy  among  these  were  the  work  of  Ramsey  (1931)  who  proposed 
that  subjective  belief  and  utility  are  the  basis  of  decision  making  and  the  non-separability  of 
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probability  from  utility.  Jeffreys’  (1891-1986)  highly  influential  book  on  probability  theory 
combined  the  logical  basis  of  probability  with  the  use  of  Bayes’  Law  as  the  basis  of  statistical 
inference  (Jeffreys,  1939).  At  about  this  time,  de  Finetti  (1906-1985),  unaware  of  Ramsey’s 
work,  adopted  the  latter’s  subjectivistic  views  to  produce  his  seminal  work  of  probability 
(de  Finetti,  1937),  later  translated  into  English  (de  Finetti,  1974).  De  Finetti  is  best  re¬ 
membered  for  the  above  writings,  and  his  bold  statement  that  “ Probability  Does  Not  Exist ” ! 
The  period  1900-1950  also  saw  the  laying  of  the  foundations  of  modern  utility  theory,  from 
which  a  prescription  for  normative  decision  making  comes  about.  The  mathematical  basis 
of  today’s  quantitative  risk  analysis  is  indeed  normative  decision  theory.  Impetus  for  a  for¬ 
mal  approach  to  utility  came  from  von  Neumann  and  Morgenstern  (1944)  with  its  interest 
in  rational  choice,  game  theory,  and  the  modelling  of  preferences.  This  was  brought  to  its 
definitive  conclusion  by  Savage  (1954),  who  proposed  a  system  of  axioms  that  linked  together 
the  ideas  of  Ramsey,  de  Finetti,  and  von  Neumann  and  Morgenstern.  Readable  accounts  of 
Savage’  brilliant  work  are  in  DeGroot  (1970)  and  Lindley  (1985),  two  highly  influential  voices 
in  the  Bayesian  approach  to  statistical  inference  and  decision  making.  Not  to  be  overlooked 
is  the  1950  treatise  of  Wald  (1902-1950)  whose  approach  to  statistical  inference  was  decision 
theoretic.  However,  unlike  that  of  Savage,  Wald’s  work  did  not  entail  the  use  of  subjective 
prior  probabilities  on  the  states  of  nature.  Hardly  mentioned  up  to  now  is  the  mathematical 
and  the  statistical  theory  of  reliability.  This  is  because  it  is  only  in  the  1950’s  and  the  1960’s 
that  reliability  emerged  as  a  distinct  field  of  study.  The  initial  impetus  of  this  field  was 
driven  by  the  demands  of  the  then  newer  technologies  in  aviation,  electronics,  space,  and 
strategic  weaponry.  Some  of  the  landmark  events  of  this  period  are:  Weibull’s  (1887-1961) 
advocacy  of  the  Weibull  distribution  for  metallurgical  failure  (Weibull,  1939,  1951),  the  sta¬ 
tistical  analysis  of  failure  data  by  Davis  (1952),  the  proposal  of  Epstein  and  Sobel  (1953) 
that  the  exponential  distribution  be  used  as  a  basic  tool  for  reliability  analysis,  the  work  of 
Grenander  (1956)  on  estimating  the  failure  rate  function  and  the  book  of  Gumbel  (1958)  on 
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the  application  of  the  theory  of  extreme  values  for  describing  failures  caused  by  extremal 
phenomena  such  as  crack  lengths,  floods,  hurricanes,  etc.,  the  approach  of  Kaplan  and  Meier 
(1958)  for  estimating  the  survival  function  under  censoring  and  the  introduction  in  Watson 
and  Wells  (1961)  of  the  notion  of  burn-in.  Some,  though  not  all,  of  this  work  was  described 
in  what  we  consider  to  be  the  very  first  few  books  on  reliability;  Bazovsky  (1961),  Lloyd  and 
Lipow  (1962),  and  Zelen  (1963).  Initially  the  statistical  community  was  slow  to  embrace 
the  Weibull  distribution  as  a  model  for  describing  random  failures;  indeed  the  Journal  of 
the  American  Statistical  Association  rejected  Weibulfs  1951  paper.  This  is  despite  the  fact 
that  the  Weibull  distribution  is  a  member  of  the  family  of  extremal  distributions  (Gnedenko, 
1943).  Subsequently,  however,  the  popularity  of  the  Weibull  grew  because  of  the  paper's  of 
Lieblien  and- Zelen  (1956),  Kao  (1958,  1959),  and  later  the  inferential  work  of  Mann  (1967, 
1968,  1969).  Today,  along  with  the  Gaussian  and  the  exponential  distributions,  the  Weibull 
is  one  of  the  most  commonly  discussed  distributions  in  statistics.  Whereas  the  emphasis  of 
the  works  mentioned  above  has  been  to  the  statistical  analysis  of  lifetime  data,  progress  in 
the  mathematical  and  probabilistic  aspects  was  also  made  during  the  1950’s  and  1960’s.  A 
landmark  event  is  Drenick  (1960)  on  the  failure  characteristics  of  a  complex  system  with  the 
replacement  of  failed  units.  It  started  a  line  of  research  in  reliability  that  focused  on  the 
probabilistic  aspects  of  components  and  systems;  in  a  similar  vein  is  a  book  by  Cox  (1962). 
The  next  major  milepost  was  the  paper  by  Birnbaum  et  al.  (1961)  on  the  structural  repre¬ 
sentation  of  systems  of  components;  inspiration  for  this  work  can  be  traced  to  the  classic 
paper  of  Moore  and  Shannon  (1956)  on  reliable  relays.  This  was  followed  by  the  paper  of 
Barlow  et  al.  (1963)  on  monotone  hazard  rates.  This  work  was  highly  influential  in  the  sense 
that  it  spawned  a  generation  of  researchers  who  explored  the  probabilistic  and  statistical 
aspects  of  monotonicity  from  different  perspectives.  Much  of  this  work  is  summarised  in  the 
two  books  of  Barlow  and  Proschaa  (1965,  1975).  There  were  other  notable  developments 
during  the  late  1960’s  and  mid  1970Js,  some  on  the  probabilistic  aspects,  and  the  others  on 
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the  statistical  aspects.  With  regards  to  the  former,  Marshall  and  Olkin  (1967)  proposed  a 
multivariate  distribution  with  exponential  marginals  for  describing  dependent  lifetimes.  The 
noteworthy  features  of  this  work  are  that  the  distribution  was  motivated  using  arguments 
that  are  physically  plausible,  and  that  its  properties  bring  out  some  subtle  aspects  of  proba¬ 
bility  models.  At  about  the  same  time,  Esary  et  al.  (1967)  proposed  a  notion  of  dependence 
that  they  called  association .  This  notion  was  motivated  by  problems  of  system  reliability 
assessment,  and  the  generality  of  the  idea  was  powerful  enough  to  attract  the  attention  of 
mathematical  statisticians  and  probabilists  to  develop  it  further.  During  this  period,  and 
perhaps  earlier  than  that,  there  was  important  work  in  reliability  also  done  in  the  Soviet 
Union.  Indeed,  Kolmogorov  (1969)  in  his  expository  papers  on  statistics,  often  used  exam¬ 
ples  from  reliability  and  lifelength  studies  to  motivate  his  material.  The  book  by  Gnedenko 
et  al.  (1969),  and  the  more  recent  review  by  Ushakov  (2000),  gives  a  perspective  on  the 
Soviet  work  in  reliability.  Some  other  developments  in  that  period  were  the  papers  of  Cox 
(1972),  and  of  Esary  et  al.  (1973)  and  the  book  by  Mann  et  al.  (1974).  Cox's  highly  influen¬ 
tial  paper  provided  a  means  for  relating  the  failure  rate  with  covariates.  A  similar  strategy 
was  used  in  Singpur walla  (1971),  in  the  context  of  accelerated  testing.  The  paper  by  Esary 
et  al.  on  shock  models  and  wear  processes  was  remarkable  in  two  respects.  The  first  is  that 
it  addressed  a  phenomenon  of  much  interest  to  engineers  and  produced  some  elegant  results. 
Second,  it  paved  the  way  for  using  stochastic  processes  to  obtain  probability  models  of  failure 
(Singpurwalla,  1995).  The  book  by  Mann  et  al.  integrated  the  probabilistic  and  statistical 
techniques  used  in  reliability  that  were  prevalent  at  that  time,  and  by  doing  so  it  created  a 
template  for  the  subsequent  books  that  followed.  The  book  was  also  the  first  of  its  kind  to 
make  a  case  for  using  Bayesian  methods  for  reliability  assessment.  Subsequent  to  the  mid 
1970's  interest  in  reliability  as  an  academic  discipline  took  a  leap  and  several  books  and  pa¬ 
pers  began  to  appear,  and  are  continuing  to  appear  today.  Notable  among  the  former  are  the 
books  by:  Lawless  (1982),  Martz  and  Waller  (1982),  Nelson  (1982,  1990),  Gertsbakh  (1989), 
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Crowder  et  a 1.  (1991),  Meeker  and  Escobar  (1998),  Aven  and  Jensen  (1999),  Singpurwalla 
and  Wilson  (1999),  Hoyland  and  Rousand  (2004)  and  Saunders  (2006).  With  the  exception 
of  Martz  and  Waller  (1982)  and  Singpurwalla  and  Wilson  (1999),  the  statistical  paradigm 
guiding  the  material  in  the  above  books  has  been  sample  theoretic  (i.e.  non- Bayesian).  In 
terms  of  signal  developments  during  the  period,  two  notable  ones  seem  to  be  Natvig  (1982) 
suggestion  to  consider  multi-state  systems,  and  the  consideration  of  subjective  Bayesianism 
in  reliability.  The  latter  was  triggered  by  Barlow's  interpretation  of  decreasing  failure  rates 
caused  by  subjective  mixing  (Barlow,  1985),  and  brought  to  its  conclusion  by  Gurland  and 
Sethuraman  (1995);  also  see  the  discussion  in  Lynn  and  Singpurwalla  (1997)  of  Block  and 
Savits  (1997).  The  book  by  Spizzichino  (2001)  is  an  authoritative  treatment  of  the  gener¬ 
ation  of  subjective  probability  models  for  lifetimes  based  on  exchangeability.  Some  other 
developments  in  reliability  have  come  about  from  the  biostatistical  perspective  of  survival 
analysis.  Notable  among  these  are  Ferguson  (1973)  and  its  advocacy  of  the  Dirichlet  process 
for  survival  analysis,  and  Aalen  (1978)  and  its  point  process  perspective  and  the  martingale 
approach  to  modelling  lifetimes.  The  former  has  been  exploited  by  Sethuraman  (1994),  and 
the  latter  by  Pena  and  Hollander  (Pena  and  Hollander)  and  Hollander  and  Pena  (2004)  in  a 
variety  of  contexts  that  are  germane  to  reliability.  To  conclude,  the  last  sixty  years  have  seen 
two  trends  in  risk.  First  of  all,  the  idea  of  risk  has  spread  to  many  other  fields  outside  the 
traditional  areas  of  insurance  and  actuarial  science.  It  is  now  an  important  idea  in  medicine, 
public  health,  law,  science  and  engineering.  Secondly,  driven  by  its  increasing  use  and  by 
the  growth  of  computing  and  data  collecting  power,  increasingly  complex  quantifications  of 
risk  and  reliability  have  been  made  to  make  better  use  of  increasing  quantities  of  data;  reli¬ 
ability  and  risk  models,  inference  and  prediction  with  those  models,  and  numerical  methods 
have  all  advanced  enormously  Since  the  1960's  in  particular,  the  literature  on  reliability, 
risk  and  survival  analysis  has  grown  in  journals  that  cover  statistics,  philosophy,  medicine, 
engineering,  law,  finance,  environment  and  public  policy.  Annual  conferences  on  risk  in  all 
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these  subject  areas  have  been  held  for  the  last  30  years.  To  these  two  trends  we  might  add 
that  the  magnitude  of  the  risks  being  quantified  and  managed  has  increased  over  the  last 
century;  environmental  pollution,  intensive  food  production  and  the  nuclear  industry  being 
examples.  The  same  trends  in  reliability  theory  can  be  discerned  as  those  in  risk:  the  spread 
of  application  into  new  fields  and  the  impact  of  increasing  computing  power  and  availability 
of  data.  It  is  worth  comparing  seminal  books  on  statistical  reliability  of  the  1960’s  such  as 
Bazovsky  (1961)  and  Barlow  and  Proschan  (1965)  with  that  of  the  current  decade  (Singpun 
walla,  2006)  to  see  how  much  the  field  has  changed.  The  debate  over  the  interpretation  of 
probability,  and  uncertainty  quantification  more  generally,  continues.  The  important  work 
of  Savage  (1954),  DeGroot  (1970)  and  de  Finetti  (1974)  publicized  the  justifications  for  the 
laws  of  probability  through  their  interpretation  as  a  subjective  degree  of  belief.  This,  along 
with  the  practical  development  of  the  necessary  numerical  tools,  lias  increased  the  use  of 
subjective  probability  and  Bayesian  inference  in  the  last  30  years.  The  strong  link  between 
risk,  reliability,  and  the  mathematical  tools  of  probability  and  decision  making,  that  has 
existed  for  400  years,  looks  set  to  continue. 
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